For comprehensive and current results, perform a real-time search at Science.gov.

1

NASA Astrophysics Data System (ADS)

Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K I), volume of distribution (V d) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K I-k 4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP I & BP II) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.

Choi, Hon-Chit; Wen, Lingfeng; Eberl, Stefan; Feng, Dagan

2006-03-01

2

This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value. PMID:22952558

Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi

2012-01-01

3

NASA Astrophysics Data System (ADS)

Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.

Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute

2014-05-01

4

GeCiM: A Novel Generalized Approach to C-Means Clustering

All three conventional c-means clustering algorithms have their advantages and disadvantages. This paper presents a novel\\u000a generalized approach to c-means clustering: the objective function is considered to be a mixture of the FCM, PCM, and HCM\\u000a objective functions. The optimal solution is obtained via evolutionary computation. Our main goal is to reveal the properties\\u000a of such mixtures and to formulate

László Szilágyi; David Iclanzan; Sándor M. Szilágyi; Dan Dumitrescu

2008-01-01

5

Integration and generalization of LVQ and c-means clustering (Invited Paper)

NASA Astrophysics Data System (ADS)

This paper discusses the relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. We also discuss the impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often lends itself to clustering algorithms. Then we present two generalizations of LVQ that are explicitly designed as clustering algorithms: we refer to these algorithms as generalized LVQ equals GLVQ; and fuzzy LVQ equals FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. We use Anderson's IRIS data to compare the performance of GLVQ/FLVQ with a standard version of LVQ. Experiments show that the final centroids produced by GLVQ are independent of node initialization and learning coefficients. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution--these are taken care of automatically.

Bezdek, James C.

1992-11-01

6

Performance research of Gaussian function weighted fuzzy C-means algorithm

NASA Astrophysics Data System (ADS)

Fuzzy C-Means (FCM) algorithm is a fuzzy pattern recognition method. Clustering precision of the algorithm is affected by its equal partition trend for data set of large discrepancy of each class samples number, and the optimal clustering result of the algorithm mightn't be a right partition in this case. In order to overcome this disadvantage, a Gaussian function Weighted Fuzzy C-Means (WFCM) algorithm is proposed, which the weighted function is produced by a Gaussian function calculating dot density of each sample. To certain extent, the WFCM algorithm has not only overcome the limitation of equal partition trend in fuzzy Cmeans algorithm, but also been favorable convergence and stability. The calculation of the weighted function and the choice of sample dot density range restriction value for the algorithm are both objective. When partially supervised information obtained from a few labeled samples is introduced to the WFCM algorithm, the classification performance of the WFCM algorithm is further enhanced and the convergent speed of objective function is further accelerated.

Liu, Xiaofang; Li, Xiaowen; Yang, Chun; He, Binbin; Zhang, Ying

2007-11-01

7

Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering

Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering model can perform digital speech recognition efficiently and the fuzzy c-means clustering has better performance than the hard k-means clustering. I. INTRODUCTION Digital speech recognition can be widely used

Freeman, Walter J.

8

Cluster algorithms for classical and quantum spin systems are discussed. In particular, the cluster algorithm is applied to classical O(N) lattice actions containing interactions of more than two spins. The performance of the multi-cluster and single--cluster methods, and of the standard and improved estimators are compared. (Lecture given at the summer school on `Advances in Computer Simulations', Budapest, July 1996.)

Ferenc Niedermayer

1997-04-21

9

In high energy physics experiments, calorimetric data reconstruction requires a suitable clustering technique in order to obtain accurate information about the shower characteristics such as position of the shower and energy deposition. Fuzzy clustering techniques have high potential in this regard, as they assign data points to more than one cluster,thereby acting as a tool to distinguish between overlapping clusters. Fuzzy c-means (FCM) is one such clustering technique that can be applied to calorimetric data reconstruction. However, it has a drawback: it cannot easily identify and distinguish clusters that are not uniformly spread. A version of the FCM algorithm called dynamic fuzzy c-means (dFCM) allows clusters to be generated and eliminated as required, with the ability to resolve non-uniformly distributed clusters. Both the FCM and dFCM algorithms have been studied and successfully applied to simulated data of a sampling tungsten-silicon calorimeter. It is seen that the FCM technique works reasonably well, and at the same time, the use of the dFCM technique improves the performance.

Radha Pyari Sandhir; Sanjib Muhuri; Tapan Nayak

2012-04-16

10

A Wavelet Relational Fuzzy C-Means Algorithm for 2D Gel Image Segmentation

One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990

Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A. B.

2013-01-01

11

A Modified Fuzzy C-Means Algorithm For Collaborative LMAM and School of Mathematical Sciences,

to the Netflix Prize data set and acquire comparable accuracy with that of MF. Categories and Subject Descriptors Keywords Collaborative Filtering, Clustering, Matrix Factorization, Fuzzy C-means, Netflix Prize 1 on servers or to redistribute to lists, requires prior specific permission and/or a fee. 2nd Netflix

Li, Tiejun

12

Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering

Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces. PMID:22871125

2012-01-01

13

A Parzen window based semi-supervised fuzzy c-means (PSFCM) clustering algorithm was presented. The initial clustering centers of fuzzy c-means (FCM) were determined with training samples. The membership iteration of FCM was redefined after the membership degrees of testing samples relatively to each state were calculated using Parzen window. Two typical faults of gear box were simulated through the gear box bed in order to acquire the lubricant samples. Concentration of Fe, Si and B, which were the representative elements, was selected as the three-dimensional feature vectors to be analyzed with FCM and PSFCM clustering methods. The clustering results were that the correct ratio of FCM was 48.9%, while that of PSFCM was 97.4% because of integrating with supervised information. Experimental results also indicated that it can reduce the dependence of the experience and lots of faults data to introduce PSFCM into oil atomic spectrometric analysis. It was of great help in improving the wear faults diagnosis ratio. PMID:20939333

Xu, Chao; Zhang, Pei-lin; Ren, Guo-quan; Wu, Ding-hai

2010-08-01

14

A Gaussian kernel-based fuzzy c-means algorithm with a spatial bias correction

Bias-corrected fuzzy c-means (BCFCM) algorithm with spatial information is especially effective in image segmentation. Since it is computationally time taking and lacks enough robustness to noise and outliers, some kernel versions of FCM with spatial constraints, such as KFCM_S1 and KFCM_S2, were proposed to solve those drawbacks of BCFCM. However, KFCM_S1 and KFCM_S2 are heavily affected by their param- eters.

Miin-shen Yang; Hsu-shen Tsai

2008-01-01

15

T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation

NASA Astrophysics Data System (ADS)

The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.

Despotovi?, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried

2010-03-01

16

Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.

Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery. PMID:22981822

Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Kim, Jin Young

2012-12-01

17

Text categorization using the semi-supervised fuzzy c-means algorithm

Text categorization (TC) is the automated assignment of\\u000a text documents to predefined categories based on document\\u000a contents. TC has become very important in the information\\u000a retrieval area, where information needs have tremendously\\u000a increased with the rapid growth of textual information\\u000a sources such as the Internet. We compare, for text\\u000a categorization, two partially supervised (or\\u000a semi-supervised) clustering algorithms: the\\u000a Semi-Supervised Agglomerative

Mohammed Benkhalifa; Amine Bensaid; Abdelhak Mouradi

1999-01-01

18

, along with data for digital systems that are compatible with the Probabilistic Risk Assessment1 FUZZY C-MEANS CLUSTERING OF SIGNAL FUNCTIONAL PRINCIPAL COMPONENTS FOR POST-PROCESSING DYNAMIC of accident scenarios generated in a dynamic safety and reliability analyses of a Nuclear Power Plant (NPP

Paris-Sud XI, UniversitÃ© de

19

Survey of clustering algorithms

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics,

Rui Xu; Donald Wunsch II

2005-01-01

20

Low-complexity fuzzy relational clustering algorithms for Web mining

This paper presents new algorithms-fuzzy c-medoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)-for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the well-known relational fuzzy c-means algorithm (RFCM) shows that

Raghu Krishnapuram; Anupam Joshi; Olfa Nasraoui; Liyu Yi

2001-01-01

21

Clustering Based on Genetic Algorithms

Clustering is an important abstraction process and it plays a vital role in both pattern recognition and data mining. Partitional\\u000a algorithms are frequently used for clustering large data sets. K-means algorithm is the most popular partitional clustering\\u000a algorithm; its fuzzy, rough, probabilistic and neural network are also popular. However, a major problem with the K-means\\u000a algorithm and its variants is

M. Narasimha Murty; Rashmin Babaria; Chiranjib Bhattacharyya

2008-01-01

22

A cluster algorithm for graphs

A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The MCL~algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with

S. Van Dongen

2000-01-01

23

Background Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). Results The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. Conclusions Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used. PMID:14651757

Wang, Junbai; B?, Trond Hellem; Jonassen, Inge; Myklebost, Ola; Hovig, Eivind

2003-01-01

24

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2012-08-15

25

NASA Astrophysics Data System (ADS)

The information on three-dimensional geometry as well as the identification of active fault segments is critical to our assessment of seismic risks. Numerical modeling of the aftershock locations, times and magnitudes are also crucial to characterize a fault zone. In this study, a pattern recognition technique based on the Fuzzy C- means clustering algorithm (Bezdek, 1981) is proposed to allow each earthquake to be associated with different fault segments. The spatial covariance tensor for each cluster and the associated earthquakes are used to find optimal anisotropic clusters and designate them as faults, similar to the OADC method (Ouillon et al., 2008). The location, size and orientation of the reconstructed faults segments are characterized using a fuzzy covariance matrix (Gustafson and Kessel, 1978). The output consists of a set of distinct fault segments along with the associated earthquakes at different fuzzy membership grades (Zadeh, 1965). A resultant matrix consists of the fuzzy membership grade for different earthquakes and corresponding faults segments specifying their degree of association with values from zero to one. The spatial distribution of earthquakes of different magnitudes and membership grades for a fault segment is incorporated in an anisotropic spatial kernel which characterizes the aftershock density at a distance vector in the ETAS model (Kagan and Knopoff, 1987; Ogata, 1988). An optimal spatio-temporal distribution of aftershocks is obtained for each fault segment without considering a priori distributions such as Gaussian or power law (Helmstetter et al., 2006; Helmstetter and Sornette, 2002). The model is tested on the aftershock sequence from the Denali, 2002 earthquake in Alaska and the fault reconstruction results compared with the known faults in the area. Therefore, a new method to incorporate the anisotropic nature of aftershock diffusion along with the reconstruction of fault networks from seismicity catalogs is formulated in this work.

Moulik, P.; Tiampo, K. F.

2009-05-01

26

Analysis (HCA) and Fuzzy C-Means (FCM) clustering, are used to classify sets of oral cancer cell data of oral cancer cell data without a pre- processing procedure. The performance of these two techniques for Cancer Diagnosis Xiao Ying Wang, Jon Garibaldi, Turhan Ozen Department of Computer Science

Garibaldi, Jon

27

DAU StatRefresher: Clustering Algorithms

NSDL National Science Digital Library

This interactive module helps students to understand the definition of and uses for clustering algorithms. Students will learn to categorize the types of clustering algorithms, to use the minimal spanning tree and the k-means clustering algorithm, and to solve exercise problems using clustering algorithms. Each component has a detailed explanation along with quiz questions. A series of questions is presented at the end to test the students understanding of the lesson's entire concept.

2009-01-22

28

On Spectral Clustering: Analysis and an algorithm

Despite many empirical successes of spectral clustering methods|algorithms that cluster points using eigenvectors of matrices derivedfrom the distances between the points|there are several unresolvedissues. First, there is a wide variety of algorithms thatuse the eigenvectors in slightly dierent ways. Second, many ofthese algorithms have no proof that they will actually compute areasonable clustering. In this paper, we present a simple

Andrew Y. Ng; Michael I. Jordan; Yair Weiss

2001-01-01

29

HARP: A Practical Projected Clustering Algorithm

In high-dimensional data, clusters can exist in subspaces that hide themselves from traditional clustering methods. A number of algorithms have been proposed to identify such projected clusters, but most of them rely on some user parameters to guide the clustering process. The clustering accuracy can be seriously degraded if incorrect values are used. Unfortunately, in real situations, it is rarely

Kevin Y. Yip; David W. Cheung; Michael K. Ng

2004-01-01

30

Cluster Detection with the PYRAMID Algorithm

As databases continue to grow in size, efficient and effective clustering algorithms play a paramount role in data mining applications. Practical clustering faces several challenges including: identifying clusters of arbitrary shapes, sensitivity to the order of input, dynamic determination of the number of clusters, outlier handling, processing speed of massive data sets, handling higher dimensions, and dependence on user-supplied parameters.

Samir Tout; William Sverdlik; Junping Sun

2007-01-01

31

Semi-supervised kernel-based fuzzy C-means with pairwise constraints

Clustering with constraints is an active area in machine learning and data mining. In this paper, a semi-supervised kernel-based fuzzy C-means algorithm called PCKFCM is proposed which incorporates both semi-supervised learning technique and the kernel method into traditional fuzzy clustering algorithm. The clustering is achieved by minimizing a carefully designed objective function. A kernel-based fuzzy term defined by the violation

Na Wang; Xia Li; Xuehui Luo

2008-01-01

32

Clustering using firefly algorithm: Performance study

A Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, that simulates the flash pattern and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous groups of objects based on the values of their attributes. In this paper, the FA is used for clustering on benchmark problems and the performance of the FA is compared

J. Senthilnath; S. N. Omkar; V. Mani

2011-01-01

33

The colliery roof collapse accident is one of the mine disasters .The influence factors have the characteristic of variety, non-linear, incertitude, etc., which make traditional neural prediction have to process a large amount of convoluted data. This paper gets cluster center and subjection degree through combine ant colony with fuzzy C mean value clustering algorithm, establishes the dynamic system distinguish

Xiaoyue Liu; Jiping Sun; Sumin Feng

2006-01-01

34

Sparse Subspace Clustering: Algorithm, Theory, and Applications

Sparse Subspace Clustering: Algorithm, Theory, and Applications Ehsan Elhamifar, Student Member of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces

Vidal, RenÃ©

35

Genetic Algorithm-Based Text Clustering Technique

A modified variable string length genetic algorithm, called MVGA, is proposed for text clustering in this paper. Our algorithm\\u000a has been exploited for automatically evolving the optimal number of clusters as well as providing proper data set clustering.\\u000a The chromosome is encoded by special indices to indicate the location of each gene. More effective version of evolutional\\u000a steps can automatically

Wei Song; Soon Cheol Park

2006-01-01

36

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590

Ergen, Burhan

2014-01-01

37

Multisolutional clustering and quantization algorithm (MCQ).

We have developed a novel clustering and quantization algorithm that allows the user to create multiple one-to-one correspondences between the actual data and its transformed (clustered and quantized) values, based on the user's hypothesis regarding the nature of the classification task. The types of problems for which the algorithm can be beneficial are discussed. We report experiments employing simulated and real data that suggest the proposed algorithm may be useful in neural network analysis of various phenomena in medicine and biology. PMID:8889341

Dvorchik, I; Marsh, W; Gurari, V; Subotin, M; Doyle, H R

1996-09-01

38

An algorithm for spatial heirarchy clustering

NASA Technical Reports Server (NTRS)

A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.

Dejesusparada, N. (principal investigator); Velasco, F. R. D.

1981-01-01

39

Performance Comparison Of Evolutionary Algorithms For Image Clustering

NASA Astrophysics Data System (ADS)

Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.

Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.

2014-09-01

40

Cluster states, algorithms and graphs

The present paper is concerned with the concept of the one-way quantum computer, beyond binary-systems, and its relation to the concept of stabilizer quantum codes. This relation is exploited to analyze a particular class of quantum algorithms, called graph algorithms, which correspond in the binary case to the Clifford group part of a network and which can efficiently be implemented on a one-way quantum computer. These algorithms can ``completely be solved" in the sense that the manipulation of quantum states in each step can be computed explicitly. Graph algorithms are precisely those which implement encoding schemes for graph codes. Starting from a given initial graph, which represents the underlying resource of multipartite entanglement, each step of the algorithm is related to a explicit transformation on the graph.

Dirk Schlingemann

2003-05-28

41

CURE: An Efficient Clustering Algorithm for Large Databases

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very frag- ile in the presence of outliers. We propose a new cluster- ing algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

1998-01-01

42

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.

Bezdek, James C.

1991-01-01

43

The Georgi Algorithms of Jet Clustering

We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing sound support from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$. Based on three basic requirements that the result of jet clustering is process-independent, for softer subjets the inclusion cone is larger, and that the cone size cannot be too large in order to avoid mixing different jets, we derive constraints on the jet function index $n$ and the jet function parameter $\\beta$ which are closely related to phase space boundaries. Finally, we demonstrate that the jet algorithm is boost invariant.

Shao-Feng Ge

2014-08-17

44

Beyond Affinity Propagation: Message Passing Algorithms for Clustering

Beyond Affinity Propagation: Message Passing Algorithms for Clustering by Inmar-Ella Givoni Beyond Affinity Propagation: Message Passing Algorithms for Clustering Inmar-Ella Givoni Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2012 Affinity propagation

Frey, Brendan J.

45

A practical clustering algorithm for static and dynamic information organization

We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally efficient cover by dense subgraphs. We further show a lower bound on the quality of the

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1999-01-01

46

Sparse subspace clustering: algorithm, theory, and applications.

Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734

Elhamifar, Ehsan; Vidal, René

2013-11-01

47

A Cross Unequal Clustering Routing Algorithm for Sensor Network

NASA Astrophysics Data System (ADS)

In the routing protocol for wireless sensor network, the cluster size is generally fixed in clustering routing algorithm for wireless sensor network, which can easily lead to the "hot spot" problem. Furthermore, the majority of routing algorithms barely consider the problem of long distance communication between adjacent cluster heads that brings high energy consumption. Therefore, this paper proposes a new cross unequal clustering routing algorithm based on the EEUC algorithm. In order to solve the defects of EEUC algorithm, this algorithm calculating of competition radius takes the node's position and node's remaining energy into account to make the load of cluster heads more balanced. At the same time, cluster adjacent node is applied to transport data and reduce the energy-loss of cluster heads. Simulation experiments show that, compared with LEACH and EEUC, the proposed algorithm can effectively reduce the energy-loss of cluster heads and balance the energy consumption among all nodes in the network and improve the network lifetime

Tong, Wang; Jiyi, Wu; He, Xu; Jinghua, Zhu; Munyabugingo, Charles

2013-08-01

48

CLASSY: An adaptive maximum likelihood clustering algorithm

NASA Technical Reports Server (NTRS)

The CLASSY clustering method alternates maximum likelihood iterative techniques for estimating the parameters of a mixture distribution with an adaptive procedure for splitting, combining, and eliminating the resultant components of the mixture. The adaptive procedure is based on maximizing the fit of a mixture of multivariate normal distributions to the observed data using its first through fourth central moments. It generates estimates of the number of multivariate normal components in the mixture as well as the proportion, mean vector, and covariance matrix for each component. The basic mathematical model for CLASSY and the actual operation of the algorithm as currently implemented are described. Results of applying CLASSY to real and simulated LANDSAT data are presented and compared with those generated by the iterative self-organizing clustering system algorithm on the same data sets.

Lennington, R. K.; Rassbach, M. E. (principal investigators)

1979-01-01

49

The Georgi Algorithms of Jet Clustering

We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing sound support from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$. Based on three basic requirements that the result of jet clustering is process-independent, for softer subjets the inclusion cone is larger, and that the cone size cannot be too large in order to avoid mixing different jets, we derive constraints on the jet function index $n$ and the jet function parameter $\\beta$ which is closely related to phase space boundaries.

Ge, Shao-Feng

2014-01-01

50

A Hash-based Hierarchical Algorithm for Massive Text Clustering

Abstract—Text clustering is the process of segmenting a particular collection of texts into subgroups including content based similar ones. The purpose of text clustering is to meet human interests in information searching and understanding. This study proposes a new fast hierarchical text clustering algorithm HBSH (Hash-based Structure Hierarchical Clustering), which is suitable for massive text clustering. This algorithm uses hash table instead of numerical vectors as its input data. Compared with the other clustering algorithms, the HBSH performs the text clustering process without setting clustering center number and has minor space complexity in advance, which can achieve better performance. The experimental results illustrate that the average time of HBSH is faster than that of traditional text clustering algorithms. Index Terms—hierarchical, text clustering, hash table I.

Yin Luo; Yan Fu

51

MSGKA: an efficient clustering algorithm for large databases

This investigation presents an efficient clustering algorithm for large databases. We present a novel multiple-searching genetic algorithm (MSGA) that finds a globally optimal partition of a given data into a specified number of clusters. We hybridize MSGA with a multiple-searching approach utilized in clustering namely, K-means algorithm. Hence, the name multiple-searching genetic K-means algorithm (MSGKA). Our simulation results reveal that

Cheng-Fa Tsai; Zhi-Cheng Chen; Chun-Wei Tsai

2002-01-01

52

A hybrid monkey search algorithm for clustering analysis.

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

53

A Hybrid Monkey Search Algorithm for Clustering Analysis

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

54

Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster

NASA Astrophysics Data System (ADS)

In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.

Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah

55

Page Clustering Using a Distance-Based Algorithm

This paper presents an application of a clustering algorithm based on gravitational forces to the problem of Web page clustering in a dynamic environment. The proposed algorithm uses a modification of the gravitational algorithm proposed by Gomez et al. but using only the distance measures (a notion of space is not required). This approach is useful when similarities (and\\/or then

Jairo Andrés Mojica; Diego Alexander Rojas; Jonatan Gómez; Fabio A. González

2005-01-01

56

A New Scan-Line Algorithm Using Clustering Approach

Correct recognition of the lines is essential for technical drawing understanding. Automation solution is quite difficult due to the limitations of machine vision algorithm. In order to promote development of better technology, according to the fast and high-quality clustering algorithm particle swarm optimization (PSO), a new fast and high-quality line clustering algorithm present in this paper, that consisting of one

Xiaoguang Tian; Yuke Ma; Xiaorong Hou

2009-01-01

57

Crowding clustering genetic algorithm for multimodal function optimization

Interest in multimodal function optimization is expanding rapidly since real-world optimization problems often require location of multiple optima in a search space. In this paper, we propose a novel genetic algorithm which combines crowding and clustering for multimodal function optimization, and analyze convergence properties of the algorithm. The crowding clustering genetic algorithm employs standard crowding strategy to form multiple niches

Qing Ling; Gang Wu; Zaiyue Yang; Qiuping Wang

2008-01-01

58

Unsupervised Clustering by Means of Hierarchical Differential Evolution Algorithm

In solving the hard clustering problem, the number of clusters in general is unknown for most real-world applications. Therefore, clustering becomes a trial-and-error task and the clustering result is often not very promising especially when the number of clusters is difficult to guess. In this paper, we propose an unsupervised clustering approach which utilizes a hierarchical differential evolution algorithm. The

Chih-chin Lai; Pei-fen Lee; Pei-yun Hsieh

2008-01-01

59

Parallelizing the Fuzzy ARTMAP Algorithm on a Beowulf Cluster

Parallelizing the Fuzzy ARTMAP Algorithm on a Beowulf Cluster Jimmy Secretan(*), JosÂ´e Castro with the match-tracking mechanism. Results run on a Beowulf cluster with a well known large database (Forrest of parallelization. This paper focuses on parallelization strategies for FAM on a Beowulf cluster. A Beowulf cluster

60

A novel speaker clustering algorithm via supervised affinity propagation

This paper addresses the problem of speaker clustering in telephone conversations. Recently, a new clustering algorithm named affinity propagation (AP) is proposed. It exhibits fast execution speed and finds clusters with low error. However, AP is an unsupervised approach which may make the resulting number of clusters different from the actual one. This deteriorates the speaker purity dramatically. This paper

Xiang Zhang; Jie Gao; Ping Lu; Yonghong Yan

2008-01-01

61

In this paper, we propose a modified variable string length genetic algorithm (MVGA) for text clustering. Our algorithm has been exploited for automatically evolving the optimal number of clusters as well as providing proper data set clustering. The chromosome is encoded by a string of real numbers with special indices to indicate the location of each gene. More effective versions

Wei Song; Soon Cheol Park

2006-01-01

62

Color sorting algorithm based on K-means clustering algorithm

NASA Astrophysics Data System (ADS)

In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.

Zhang, BaoFeng; Huang, Qian

2009-11-01

63

Genetic algorithm for text clustering based on latent semantic indexing

In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which is typical for textual data. Because the most straightforward and popular approach represents texts with

Wei Song; Soon Cheol Park

2009-01-01

64

Approximation Algorithms for Clustering to Minimize the Sum of Diameters

. We consider the problem of partitioning the nodes of a completeedge weighted graph into k clusters so as to minimize the sumof the diameters of the clusters. Since the problem is NP-complete, ourfocus is on the development of good approximation algorithms. Whenedge weights satisfy the triangle inequality, we present the first approximationalgorithm for the problem. The approximation algorithm yieldsa

Srinivas R. Doddi; Madhav V. Marathe; S. S. Ravi; David Scot Taylor; Peter Widmayer

2000-01-01

65

Parallel hash-based EST clustering algorithm for gene sequencing.

EST clustering is a simple, yet effective method to discover all the genes present in a variety of species. Although using ESTs is a cost-effective approach in gene discovery, the amount of data, and hence the computational resources required, make it a very challenging problem. Time and storage requirements for EST clustering problems are prohibitively expensive. Existing tools have quadratic time complexity resulting from all against all sequence comparisons. With the rapid growth of EST data we need better and faster clustering tools. In this paper, we present HECT (Hash based EST Clustering Tool), a novel time- and memory-efficient algorithm for EST clustering. We report that HECT can cluster a 10,000 Human EST dataset (which is also used in benchmarking d2_cluster), in 207 minutes on a 1 GHz Pentium III processor which is 36 times faster than the original d2_cluster algorithm. A parallel version of HECT (PECT) is also developed and used to cluster 269,035 soybean EST sequences on IA-32 Linux cluster at National Center for Supercomputing Applications at UIUC. The parallel algorithm exhibited excellent speedup over its sequential counterpart and its memory requirements are almost negligible making it suitable to run virtually on any data size. The performance of the proposed clustering algorithms is compared against other known clustering techniques and results are reported in the paper. PMID:15585119

Mudhireddy, R; Ercal, F; Frank, R

2004-10-01

66

The effect of using evolutionary algorithms on Ant Clustering Techniques.

The effect of using evolutionary algorithms on Ant Clustering Techniques. Claus Aranha1 and Hitoshi@iba.k.u-tokyo.ac.jp Abstract. Ant-based clustering is a biologically inspired data cluster- ing technique. In this technique, hunting and foraging food [5, 2]. The coordination of an ant colony is of local nature, composed mainly

Fernandez, Thomas

67

Derivation and analytic evaluation of an equivalence relation clustering algorithm.

Clustering algorithms have been recently used in multitarget multisensor tracking (MMT) problems in order to reduce the size of the data association problem. This paper derives an equivalence relation (ER) clustering algorithm used in a MMT problem and briefly compares it to other clustering schemes such as the nearest neighbor method. The main contribution of this work is the analytical evaluation of ER clustering performance, in the context of multitarget multisensor tracking, as a function of the distance between targets, measurement probability density function, and cluster parameter. PMID:18252369

Nabaa, N; Bishop, R H

1999-01-01

68

A Systematic Comparison of Genome Scale Clustering Algorithms - (Extended Abstract)

\\u000a A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad array\\u000a of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique\\u000a communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative\\u000a effectiveness provides guidance to algorithm selection, development and implementation.

Jeremy J. Jay; John D. Eblen; Yun Zhang; Mikael Benson; Andy D. Perkins; Arnold M. Saxton; Brynn H. Voy; Elissa J. Chesler; Michael A. Langston

2011-01-01

69

A systematic comparison of genome-scale clustering algorithms

BACKGROUND: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and

Jeremy J Jay; John D Eblen; Yun Zhang; Mikael Benson; Andy D Perkins; Arnold M Saxton; Brynn H Voy; Elissa J Chesler; Michael A Langston

2012-01-01

70

APPROXIMATION ALGORITHMS FOR CLUSTERING TO MINIMIZE THE SUM OF DIAMETERS

We consider the problem of partitioning the nodes of a complete edge weighted graph into {kappa} clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution that has no more than 10k clusters such the total diameter of these clusters is within a factor O(log (n/{kappa})) of the optimal value fork clusters, where n is the number of nodes in the complete graph. For any fixed {kappa}, we present an approximation algorithm that produces {kappa} clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any {rho} {ge} 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of {rho} even when the number of clusters is fixed at 3. Other results obtained include a polynomial time algorithm for the problem when the underlying graph is a tree with edge weights.

Kopp, S.; Mortveit, H.S.; Reidys, S.M.

2000-02-01

71

The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148

Ahmed, Zakir Hussain

2014-01-01

72

Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg

of Maryland, Baltimore County, gobbert@math.umbc.edu Abstract Affinity propagation is a clustering algorithm to use all 128 processor cores currently available. 1 Introduction Affinity propagation is a relatively

Gobbert, Matthias K.

73

DECA: A Discrete-Valued Data Clustering Algorithm

This paper presents a new clustering algorithm for analyzing unordered discrete-valued data. This algorithm consists of a cluster initiation phase and a sample regrouping phase. The first phase is based on a data-directed valley detection process utilizing the optimal second-order product approximation of high-order discrete probability distribution, together with a distance measure for discrete-valued data. As for the second phase,

Andrew K. C. Wong; David C. C. Wang

1979-01-01

74

A Delay-Based Clustering Algorithm for Wireless Sensor Networks

\\u000a In order to prolong the network lifetime, energy-efficient routing protocols should be designed to adapt the characteristic\\u000a of wireless sensor networks. Clustering algorithm is a kind of key technique used to reduce energy consumption, which can\\u000a make a longer life span of sensor network. A clustering algorithm based on timer mechanism (CATM) was presented. By means\\u000a of timer mechanism, it

Fengjun Shang; Donghai Ren

75

Genetic Algorithm Based Optimization of Clustering in Ad Hoc Networks

In this paper, we have to concentrate on implementation of Weighted Clustering Algorithm with the help of Genetic Algorithm (GA).Here we have developed new algorithm for the implementation of GA-based approach with the help of Weighted Clustering Algorithm (WCA) (4). ClusterHead chosen is a important thing for clustering in adhoc networks. So, we have shown the optimization technique for the minimization of ClusterHeads(CH) based on some parameter such as degree difference, Battery power (Pv), degree of mobility, and sum of the distances of a node in adhoc networks. ClusterHeads selection of adhoc networks is an important thing for clustering. Here, we have discussed the performance comparison between deterministic approach and GA based approach. In this performance comparison, we have seen that GA does not always give the good result compare to deterministic WCA algorithm. Here we have seen connectivity (connectivity can be measured by the probability that a node is reachable to any other node.) is better th...

Nandi, Bhaskar; Paul, Soumen

2010-01-01

76

Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks

Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks Rajesh Krishnan BBN in these networks. In this paper, we make contributions towards improving the efficiency of self-organization in wireless sensor networks. We first present a novel approach for message-efficient clustering, in which

Starobinski, David

77

WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks

Abstract: In this paper, we propose an on-demand distributed clustering algorithm for multi-hop packet radio networks. These types ofnetworks, also known as ad hoc networks, are dynamic in nature due to the mobility of nodes. The association and dissociation of nodes toand from clusters perturb the stability of the network topology, and hence a reconfiguration of the system is often

Mainak Chatterjee; Sajal K. Das; Damla Turgut

2002-01-01

78

A Kernel Density Window Clustering Algorithm for Radar Pulses

As radar signal environments become denser and more complex, the capability of high-speed and accurate signal analysis is required for ES (electronic warfare support) system to identify individual radar signals at real-time. In this paper, we propose the novel clustering algorithm of radar pulses to alleviate the load of signal analysis process and support reliable analysis. The proposed algorithm uses

Dong-Weon Lee; Jin-Woo Han; Kyu-Ha Song; Won Don Lee

2008-01-01

79

Clustering of Hadronic Showers with a Structural Algorithm

The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.

Charles, M.J.; /SLAC

2005-12-13

80

NASA Astrophysics Data System (ADS)

Traditional hierarchical clustering algorithms require the calculation of a dissimilarity matrix which is mapped to a binary tree or 'dendogram' based upon some predetermined criterion. Although 'optimally efficient' algorithms requiring O(N2) time and O(N) storage are known for several clustering methods, with few exceptions these algorithms are relatively inefficient in practice as many pairwise distance are measured which are not necessary for generation of the binary tree. We describe here a novel 'almost single link' algorithm which is efficient both theoretically and in practice, and which can be extended to provide fast algorithms for centroid, medium and single link clustering of large data sets. Generalization to other related clustering methods is expected to be straightforward. Our algorithm also suggests a fairly efficient method for generating minimal spanning trees. In performing the segmentation we employ a particular representation of the binary tree which simplifies the task of manual investigation of the hierarchy. A customized graphical user interface including a 2D scatter plot, a visual display of the dendogram, and a false color image with overlayered clusters makes the clustering procedure a highly interactive one. By suggesting, for each of the clustering methods, possible criteria which might be useful for extracting relevant clusters from the tree information, we are able to fully automate the cluster selection procedure and thereby further reduce the effort required to segment an image. The algorithms described have been transcribed into C code and combined into a single package, the 'hierarchical agglomerative clusterer', which has been applied to the analysis of hyperspectral image data of various forest and desert scenes acquired by the HYDICE sensor. The analyses were performed on a 266 Mhz Pentium PC platform running Windows NT 4.0. Typical segmentation times for the fastest algorithm ranged form 17 seconds for a 15232-pixel image to 2833 seconds for a 209840-pixel image, each pixel representing a 210-band spectrum. These initial studies suggest that the HAC package will provide a sound framework for making detailed comparisons of the effects of different clustering algorithms or dissimilarity measures. Its overall speed makes it a promising tool not only for hyperspectral image processing applications but for multivariate data analysis as a whole.

Rahman, Sabbir A.

1998-10-01

81

A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely. PMID:24741359

Dai, Bin; Xie, Zhongyu

2014-01-01

82

A local search approximation algorithm for k-means clustering

Abstract In k-means clustering we are given a set of n data points in d-dimensional space ,, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known,for this problem. Although asymptotically efficient approximation algorithms exist, these algorithms are not practical due to the very high constant factors involved.

Tapas Kanungo; David M. Mount; Nathan S. Netanyahu; Christine D. Piatko; Ruth Silverman; Angela Y. Wu

2004-01-01

83

An Empirical Comparison of NML Clustering Algorithms

Clustering can be defined as a data assignment problem where the goal is to partition the data into non- hierarchical groups of items. In our previous work, we suggested an information-theoretic criterion, based on the minimum de- scription length (MDL) principle, for defining the goodness of a clustering of data. The basic idea behind this framework is to optimize the

Petri Kontkanen; Petri Myllymäki

2008-01-01

84

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

NASA Technical Reports Server (NTRS)

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

2006-01-01

85

K-Distributions: A New Algorithm for Clustering Categorical Data

NASA Astrophysics Data System (ADS)

Clustering is one of the most important tasks in data mining. The K-means algorithm is the most popular one for achieving this task because of its efficiency. However, it works only on numeric values although data sets in data mining often contain categorical values. Responding to this fact, the K-modes algorithm is presented to extend the K-means algorithm to categorical domains. Unfortunately, it suffers from computing the dissimilarity between each pair of objects and the mode of each cluster. Aiming at addressing these problems confronting K-modes, we present a new algorithm called K-distributions in this paper. We experimentally tested K-distributions using the well known 36 UCI data sets selected by Weka, and compared it to K-modes. The experimental results show that K-distributions significantly outperforms K-modes in term of clustering accuracy and log likelihood.

Cai, Zhihua; Wang, Dianhong; Jiang, Liangxiao

86

[Multispectral image compression algorithm based on clustering and wavelet transform].

Aiming at the problem of high time-space complexity and inadequate usage of spectral characteristics of existing multispectral image compression algorithms, an inter-spectrum sparse equivalent representation of multispectral image and its clustering realization ways were studied. Meanwhile, a new multispectral image compression algorithm based on spectral adaptive clustering and wavelet transform was designed. The affinity propagation clustering was utilized to generate inter-spectrum sparse equivalent representation which can remove inter-spectrum redundancy under low complexity, two-dimensional wavelet transform was used to remove spatial redundancy, and set partitioning in hierarchical trees (SPIHT) was used to encode. The quality of reconstruction images was improved by error compensation mechanism. Experimental results show that the proposed approach achieves good performance in time-space complexity, the peak signal-to-noise ratio(PSNR) is significantly higher than that of similar compression algorithms under the same compression ratio, and it is a generic and effective algorithm. PMID:24409728

Liang, Wei; Zeng, Ping; Zhang, Hua; Luo, Xue-Mei

2013-10-01

87

Sampling Within k-Means Algorithm to Cluster Large Datasets

Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

2011-08-01

88

Personalized PageRank Clustering: A graph clustering algorithm based on random walks

NASA Astrophysics Data System (ADS)

Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.

A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali

2013-11-01

89

ICAIS: A Novel Incremental Clustering Algorithm Based on Artificial Immune Systems

Although many kinds of clustering algorithms are proposed, there has been much less work on the incremental clustering. Inspired by the artificial immune systems, the authors apply it to the incremental clustering, propose a novel incremental clustering algorithm called ICAIS. The algorithm mainly uses the mechanism of immune response of the adaptive immune system. The primary immune response corresponds to

Xianghua Li; Tianyang Lu; Zhengxuan Wang; Chao Gao

2008-01-01

90

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to

Martin Ester; Hans-peter Kriegel; Jörg Sander; Xiaowei Xu

1996-01-01

91

Genetic algorithms for determining the topological structure of metallic clusters

Genetic algorithms (GA) are applied for the optimization of the structure of metallic clusters by the calculation of the ground-state energies from a tight-binding (Hückel) Hamiltonian. The optimum topology or graph is searched by the use of the adjacency matrix A ij as a natural coding. The initial populations for N-atom clusters are generated from a representative group of fit

R. Poteau; G. M. Pastor

1999-01-01

92

NCUBE - A clustering algorithm based on a discretized data space

NASA Technical Reports Server (NTRS)

Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.

Eigen, D. J.; Northouse, R. A.

1974-01-01

93

The walking cluster algorithm Internal Report

of the interaction between these genes. Based on the hypothesis that similarity in expression implies similarity genes. Based on the hypothesis that coexpressed genes are more likely to be coregulated, restricting, it is essential to make use of a clustering alg

94

NASA Technical Reports Server (NTRS)

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.

Lennington, R. K.; Johnson, J. K.

1979-01-01

95

Vetoed jet clustering: The mass-jump algorithm

A new class of jet clustering algorithms is introduced. A criterion inspired by successful mass-drop taggers is applied which prevents the recombination of two hard prongs if they experience a substantial jump in jet mass. This veto effectively results in jets with variable radius in dense environments. Differences to existing methods are investigated and it is shown for boosted top quarks that the new algorithm has beneficial properties which can lead to improved tagging purity.

Stoll, Martin

2014-01-01

96

Vetoed jet clustering: The mass-jump algorithm

A new class of jet clustering algorithms is introduced. A criterion inspired by successful mass-drop taggers is applied which prevents the recombination of two hard prongs if they experience a substantial jump in jet mass. This veto effectively results in jets with variable radius in dense environments. Differences to existing methods are investigated and it is shown for boosted top quarks that the new algorithm has beneficial properties which can lead to improved tagging purity.

Martin Stoll

2014-10-17

97

A gauge invariant cluster algorithm for the Ising spin glass

The frustrated Ising model in two dimensions is revisited. The frustration is quantified in terms of the number of non-trivial plaquettes which is invariant under the Nishimori gauge symmetry. The exact ground state energy is calculated using Edmond's algorithm. A novel cluster algorithm is designed which treats gauge equivalent spin glasses on equal footing and allows for efficient simulations near criticality. As a first application, the specific heat near criticality is investigated.

K. Langfeld; M. Quandt; W. Lutz; H. Reinhardt

2006-06-14

98

Software Fault Feature Clustering Algorithm Based on Sequence Pattern

NASA Astrophysics Data System (ADS)

Software fault feature analysis has been the important part of software security property analysis and modeling. In this paper, a software fault feature clustering algorithm based on sequence pattern (SFFCSP) is proposed. In SFFCSP, Fault feature matrix is defined to store the relation between the fault feature and the existing sequence pattern. The optimal number of clusters is determined through computing the improved silhouette of fault feature matrix row vector, which corresponds to the software fault feature. In the agglomerative hierarchical clustering phase, entropy is considered as the similarity metric. In order to improve the time complexity of the software fault feature analysis, the fault features of the software to be analyzed are matched to each centroid of clustering results. Experimental results show that SFFCSP has better clustering accuracy and lower time complexity compared with the SEQOPTICS.

Ren, Jiadong; Hu, Changzhen; Wang, Kunsheng; Zhang, Dongmei

99

Data clustering algorithms based on Swarm Intelligence

For a decade swarm Intelligence, an artificial intelligence discipline, is concerned with the design of intelligent multi-agent systems by taking inspiration from the collective behaviors of social insects and other animal societies. Swarm Intelligence is a successful paradigm for the algorithm with complex problems. This paper focuses on the procedure of most successful methods of optimization techniques inspired by Swarm

Pankaj K. Bharne; V. S. Gulhane; Shweta K. Yewale

2011-01-01

100

Clustering Algorithm for Mutually Constraining Heterogeneous Features

Institute of Technology Pasadena, CA 91109-8099 Pasadena, CA 91109-8099 wolfgang.fink@.jpl.nasa.gov rebecca.castano@jpl of Technology California Institute of Technology Pasadena, CA 91109-8099 Pasadena, CA 91109-8099 ashley.davies@jpl.nasa.gov mjolsness@jpl.nasa.gov Abstract We introduce a general type of optimization algorithm which infers data

101

The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey

We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.

Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,

2005-03-01

102

Efficient Algorithms for Sampling and Clustering of Large Nonuniform Networks

We propose efficient algorithms for two key tasks in the analysis of large nonuniform networks: uniform node sampling and cluster detection. Our sampling technique is based on augmenting a simple, but slowly mixing uniform MCMC sampler with a regular random walk in order to speed up its convergence; however the combined MCMC chain is then only sampled when it is

Pekka OrponenSatu; Satu Elisa Schaeffer

2004-01-01

103

A New Algorithm for Text Clustering Based on Projection Pursuit

Vector Space Model ( VSM ) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing. A new pursuit projection based text clustering algorithm is proposed. With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection

Mao-Ting Gao; Zheng-Ou Wang

2007-01-01

104

A Clustering Genetic Algorithm for Actuator Optimization in Flow Control

Active flow control can provide a leap in the perform- nace of engineering configurations. Although a number of sensor and actuator configurations have been proposed the task of identifying optimal parameters for control devices is based on engineering intuition usually gathered from un- controlled flow experiments. Here we propose a clustering genetic algorithm that adaptively identifies critical points in the

Michele Milano; Petros Koumoutsakos

2000-01-01

105

Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem

NASA Astrophysics Data System (ADS)

An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.

Davendra, Donald; Zelinka, Ivan; Senkerik, Roman

2009-08-01

106

State Information-based Ant Colony Clustering Algorithm

State information-based ant colony clustering algorithm is proposed in the paper. The data object is denoted as an ant which has behaviors such as moving or sleeping, the state information's influence on the ants' behaviors is paid more attention. The reference value of ants' information in the static and active state is increased or decreased respectively. State information is taken

Jie Shen; Kun He; Liu-hua Wei; Lei Bi; Rong-shuang Sun; Fa-yan Xu

2008-01-01

107

A Decentralized Fuzzy C-Means-Based Energy-Efficient Routing Protocol for Wireless Sensor Networks

Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060

2014-01-01

108

ORCA: The Overdense Red-sequence Cluster Algorithm

We present a new cluster detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (a) the similarity in colour of cluster galaxies (the "red sequence") and (b) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space, (ii) a Voronoi diagram identifies regions of high surface density, (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper we present the algorithm and demonstrate it on two datasets. The first is a 7 square degree sample of the deep Sloan Digital Sky Survey equatorial stripe (Stripe 82), from which we detect 97 clusters with z10^13 solar ma...

Murphy, D N A; Bower, R G

2011-01-01

109

Over the last several years, various clustering algorithms for wireless sensor networks have been proposed to prolong network\\u000a lifetime. Most clustering algorithms provide an equal cluster size using node’s ID, degree and etc. However, many of these\\u000a algorithms heuristically determine the cluster size, even though the cluster size significantly affects the energy consumption\\u000a of the entire network. In this paper,

Sungryoul Lee; Han Choe; Yukyoung Song; Chong-kwon Kim

2011-01-01

110

Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043

Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

2012-01-01

111

In this study an unsupervised way of fire pixel detection from video frames is depicted. A hybrid clustering algorithm is proposed, depending on color samples in video frames. A modified k-mean clustering algorithm is used here. In this algorithm hierarchical and partition clustering are used to build the hybrid. The results are analyzed with color base threshold method by considering

Ishita Chakraborty; Tanoy Kr. Paul

2010-01-01

112

A local search approximation algorithm for k-means clustering

In k-means clustering we are given a set of n data points in d-dimensional space Rd and an integer k, and the problem is to determine a set of k points in ÓC;d, called centers, to minimize the mean squared distance from each data point to its nearest center. No exact polynomial-time algorithms are known for this problem. Although asymptotically

Tapas Kanungo; David M. Mount; Nathan S. Netanyahu; Christine D. Piatko; Ruth Silverman; Angela Y. Wu

2002-01-01

113

An Improved Distance Matrix Computation Algorithm for Multicore Clusters

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779

Al-Neama, Mohammed W.; Reda, Naglaa M.; Ghaleb, Fayed F. M.

2014-01-01

114

To combine steady-state genetic algorithm and ensemble learning for data clustering

This paper proposes a data clustering algorithm that combines the steady-state genetic algorithm and the ensemble learning method, termed as genetic-guided clustering algorithm with ensemble learning operator (GCEL). GCEL adopts the steady-state genetic algorithm to perform the search task, but replaces its traditional recombination operator with an ensemble learning operator. Therefore, GCEL can avoid the problems of clustering invalidity and

Yi Hong; Sam Kwong

2008-01-01

115

A new algorithm of selecting the radial basis function networks center

The selecting of the radial basis function center is a key factor that influences the performance of networks. In this paper we first introduce briefly the fuzzy c-mean algorithm and k-nearest-neighbor algorithm as to the selection of the radial basis function center, and then we present a ?-nearest-neighbor cluster algorithm which combines the k-nearest-neighbor algorithm with fuzzy c-mean algorithm. Finally,

Hong-Rui Wang; Hong-Bin Wang; Li-Xin Wei; Ying Li

2002-01-01

116

Moving to Smaller Libraries via Clustering and Genetic Algorithms G. Antoniol

Moving to Smaller Libraries via Clustering and Genetic Algorithms G. Antoniol , M. Di Penta , M the memory requirements of exe- cutables. The approach is organized in two steps. The first step defines genetic algorithms. In particular, a novel genetic algorithm approach, con- sidering the initial clusters

Di Penta, Massimiliano

117

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

Bezdek, James C.

1992-01-01

118

PFClust: an optimised implementation of a parameter-free clustering algorithm

Background A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data. Results The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast. Conclusions In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original. PMID:24490618

2014-01-01

119

MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg

, Baltimore County, gobbert@math.umbc.edu Abstract Affinity propagation is a clustering algorithm processes than OpenMPI for this code. 1 Introduction Affinity propagation is a relatively new clustering

Gobbert, Matthias K.

120

jClustering, an Open Framework for the Development of 4D Clustering Algorithms

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Perez, Jose Maria; Garcia-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.

2013-01-01

121

jClustering, an open framework for the development of 4D clustering algorithms.

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J

2013-01-01

122

A Study on Text Clustering Algorithms Based on Frequent Term Sets

\\u000a In this paper, a new text-clustering algorithm named Frequent Term Set-based Clustering (FTSC) is introduced. It uses frequent\\u000a term sets to cluster texts. First, it extracts useful information from documents and inserts into databases. Then, it uses\\u000a the Apriori algorithm based on association rules mining efficiently to discover the frequent items sets. Finally, it clusters\\u000a the documents according to the

Xiangwei Liu; Pilian He

2005-01-01

123

Differential Evolution Based Fuzzy Clustering

NASA Astrophysics Data System (ADS)

In this work, two new fuzzy clustering (FC) algorithms based on Differential Evolution (DE) are proposed. Five well-known data sets viz. Iris, Wine, Glass, E. Coli and Olive Oil are used to demonstrate the effectiveness of DEFC-1 and DEFC-2. They are compared with Fuzzy C-Means (FCM) algorithm and Threshold Accepting Based Fuzzy Clustering algorithms proposed by Ravi et al., [1]. Xie-Beni index is used to arrive at the 'optimal' number of clusters. Based on the numerical experiments, we infer that, in terms of least objective function value, these variants can be used as viable alternatives to FCM algorithm.

Ravi, V.; Aggarwal, Nupur; Chauhan, Nikunj

124

Fuzzy Possibility C-Mean Based on Complete Mahalanobis Distance and Separable Criterion

Abstract: - The well known fuzzy partition clustering algorithms are most based on Euclidean distance function, which can only be used to detect spherical structuralclusters. Gustafson-Kessel(GK) clustering algorithm and Gath-Geva (GG) clustering algorithm, were developed to detect non-spherical structural clusters, but both of

Hsiang-chuan Liu; Der-bang Wu; Jeng-ming Yih; Shin-wu Liu

2008-01-01

125

Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman] into the hierarchical setting. The class of linkage- based algorithms is perhaps the most popu- lar class of hierarchical algorithms. We iden- tify two properties of hierarchical algorithms, and prove that linkage

Ackerman, Margareta

126

Textural defect detect using a revised ant colony clustering algorithm

NASA Astrophysics Data System (ADS)

We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.

Zou, Chao; Xiao, Li; Wang, Bingwen

2007-11-01

127

The Algorithm of Connected Dominating Set Based Clustering in Sensor Networks

In this paper, we present a novel algorithm of connected dominating set based clustering in sensor networks. Considering the characteristics and location information of nodes in sensor networks, a modified directed transfer model of sensor networks, and a novel clustering algorithm based on area is proposed in this paper. Theoretical analyses and simulation results show that, the above new methods

Zhijun Xie; Jin Guang

2009-01-01

128

Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data

1 Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data #12;2 Abstract Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance

129

Contributions to "k"-Means Clustering and Regression via Classification Algorithms

ERIC Educational Resources Information Center

The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…

Salman, Raied

2012-01-01

130

Wireless sensor networks (WSNs) are composed of a large number of sensor nodes with limited computation capability, power and memory. Balance of energy consumption between nodes can reduce the number of dead nodes and prolong the network lifetime. In this paper, based on the novel clustering algorithm Affinity Propagation (AP), a power efficient cluster head selection algorithm (PECBA) is proposed

Ru Gao; Hongyan Cui; Jian Li; Chuanhui Li; Jianya Chen

2010-01-01

131

Security clustering algorithm based on reputation in hierarchical peer-to-peer network

NASA Astrophysics Data System (ADS)

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.

Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji

2013-03-01

132

A nonparametric clustering algorithm with a quantile-based likelihood estimator.

Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods. PMID:24922504

Hino, Hideitsu; Murata, Noboru

2014-09-01

133

Comparison of two optical cluster finding algorithms for the new generation of deep galaxy surveys

We present a comparison between two optical cluster finding methods: a matched filter algorithm using galaxy angular coordinates and magnitudes, and a percolation algorithm using also redshift information. We test the algorithms on two mock catalogues. The first mock catalogue is built by adding clusters to a Poissonian background, while the other is derived from N-body simulations. Choosing the physically most sensible parameters for each method, we carry out a detailed comparison and investigate advantages and limits of each algorithm, showing the possible biases on final results. We show that, combining the two methods, we are able to detect a large part of the structures, thus pointing out the need to search for clusters in different ways in order to build complete and unbiased samples of clusters, to be used for statistical and cosmological studies. In addition, our results show the importance of testing cluster finding algorithms on different kinds of mock catalogues to have a complete assessment of their behaviour.

D. Rizzo; C. Adami; S. Bardelli; A. Cappi; E. Zucca; B. Guiderdoni; G. Chincarini; A. Mazure

2003-10-03

134

TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

In this paper we introduce a novel algorithm called TRICLUSTER, for mining coherent clusters in three-dimensional (3D) gene expression datasets. TRICLUSTER can mine arbitrarily positioned and overlapping clusters, and depending on different parameter values, it can mine different types of clusters, including those with constant or similar values along each dimension, as well as scaling and shifting expression patterns. TRICLUSTER

Lizhuang Zhao; Mohammed Javeed Zaki

2005-01-01

135

The watershed-clustering algorithm was adapted for use in multi-dimentional spectral space and was used to define clusters in Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. This algorithm identifies clusters as peaks in a B-dimensional topographic relief, where B is the number of wavelength bands. Image pixel spectra are represented as points in this multi-dimensional space. Analysis is done at increasing

Gerard P. Jellison; Terrence H. Hemmer; Darryl G. Wilson

2002-01-01

136

NASA Astrophysics Data System (ADS)

In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k=2,3,… clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.

Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf

2014-12-01

137

Interactive Query Expansion With the Use of Clustering-by-Directions Algorithm

This paper concerns clustering-by-directions algo- rithm. The algorithm introduces a novel approach to interactive query expansion. It is designed to support users of search engines in forming Web search queries. When a user executes a query, the algorithm shows potential directions in which the search can be continued. This paper describes the algorithm, and it presents an enhancement which reduces

Adam L. Kaczmarek

2011-01-01

138

Ranking and selecting clustering algorithms using a meta-learning approach

We present a novel framework that applies a meta- learning approach to clustering algorithms. Given a dataset, our meta-learning approach provides a ranking for the candidate algorithms that could be used with that dataset. This ranking could, among other things, support non-expert users in the algorithm selection task. In order to evaluate the framework proposed, we implement a prototype that

Marcílio Carlos Pereira De Souto; Ricardo Bastos Cavalcante Prudêncio; Rodrigo G. F. Soares; Daniel S. A. De Araujo; Ivan G. Costa; Teresa Bernarda Ludermir; Alexander Schliep

2008-01-01

139

A fast kernel-based multilevel algorithm for graph clustering

Graph clustering (also called graph partitioning) --- clustering the nodes of a graph --- is an important problem in diverse data mining applications. Traditional approaches involve optimization of graph clustering objectives such as normalized cut or ratio association; spectral methods are widely used for these objectives, but they require eigenvector computation which can be slow. Recently, graph clustering with a

Inderjit S. Dhillon; Yuqiang Guan; Brian Kulis

2005-01-01

140

An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.

Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

2014-01-01

141

Kmeans-type clustering aims at partitioning a data set into clusters such that the objects in a cluster are compact and the objects in different clusters are well separated. However, most kmeans-type clustering algorithms rely on only intracluster compactness while overlooking intercluster separation. In this paper, a series of new clustering algorithms by extending the existing kmeans-type algorithms is proposed by integrating both intracluster compactness and intercluster separation. First, a set of new objective functions for clustering is developed. Based on these objective functions, the corresponding updating rules for the algorithms are then derived analytically. The properties and performances of these algorithms are investigated on several synthetic and real-life data sets. Experimental studies demonstrate that our proposed algorithms outperform the state-of-the-art kmeans-type clustering algorithms with respect to four metrics: accuracy, RandIndex, Fscore, and normal mutual information. PMID:25050942

Huang, Xiaohui; Ye, Yunming; Zhang, Haijun

2014-08-01

142

A highly efficient multi-core algorithm for clustering extremely large datasets

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922

2010-01-01

143

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

2014-01-01

144

NASA Technical Reports Server (NTRS)

We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.

Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William

2006-01-01

145

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

146

An incremental affinity propagation algorithm and its applications for text clustering

Affinity propagation is an impressive clustering algorithm which was published in Science, 2007. However, the original algorithm couldn't cope with part known data directly. Focusing on this issue, a semi-supervised scheme called incremental affinity propagation clustering is proposed in the paper. In the scheme, the pre-known information is represented by adjusting similarity matrix. Moreover, an incremental study is applied to

Xiaohu Shi; Renchu Guan; Liupu Wang; Zhili Pei; Yanchun Liang

2009-01-01

147

Visual group identification method of technical competitors using LinLog graph clustering algorithm

Visualization technique is a powerful method used by science and technology intelligence analysis experts to identify technical competitor groups. Common visualization methods tend to create graphs meeting the aesthetic criteria instead of finding better clusters, and their analysis results may provide misleading information. A process model of technical group identification method was presented using LinLog graph clustering algorithm to find

Hong-qi Han; Xiaomi An; Donghua Zhu; Xuefeng Wang

2011-01-01

148

PMW: a Robust Clustering Algorithm for Mobile Ad-hoc Networks Zhaowen Xing and Le Gruenwald

by three parameters: Power, Mobility and Workload (PMW), where: Â· Power is the remaining battery power when a large time interval in the history will most likely have less workload in the future. Cluster formation. 413-418. [Sheu, 2006] P. Sheu and C. Wang, "A Stable Clustering Algorithm Based on Battery Power

Gruenwald, Le

149

CLICKS: an effective algorithm for mining subspace clusters in categorical datasets

We present a novel algorithm called CLICKS, that finds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, CLICKS mines subspace clusters. It uses a selective vertical method to guarantee complete search. CLICKS outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for high-dimensional datasets.

Mohammed Javeed Zaki; Markus Peters; Ira Assent; Thomas Seidl

2005-01-01

150

Motivation: Cluster analysis (of gene-expression data) is a useful tool for identifying biologically relevant groups of genes that show similar expression patterns under multiple experimental conditions. Various methods have been proposed for clustering gene-expres- sion data. However most of these algorithms have several short- comings for gene-expression data clustering. In the present article, we focus on several shortcomings of conventional

Anindya Bhattacharya; Rajat K. De

2008-01-01

151

A New Method For Galaxy Cluster Detection. I. The Algorithm

Numerous methods for finding clusters at moderate to high redshifts have been proposed in recent years, at wavelengths ranging from radio to X-rays. In this paper we describe a new method for detecting clusters in two-band optical\\/near-IR imaging data. The method relies upon the observation that all rich clusters, at all redshifts observed so far, appear to have a red

Michael D. Gladders; H. K. C. Yee

2000-01-01

152

A New Method For Galaxy Cluster Detection I: The Algorithm

Numerous methods for finding clusters at moderate to high redshifts have been\\u000aproposed in recent years, at wavelengths ranging from radio to X-rays. In this\\u000apaper we describe a new method for detecting clusters in two-band\\u000aoptical\\/near-IR imaging data. The method relies upon the observation that all\\u000arich clusters, at all redshifts observed so far, appear to have a red

Michael D. Gladders; H. K. C. Yee

2000-01-01

153

The analysis of a simple k -means clustering algorithm

K-means clustering is a very popular clustering technique which is used in numerous applications.Given a set of n data points in Rdand an integer k, the problem is to determine a setof k points Rd, called centers, so as to minimize the mean squared distance from each datapoint to its nearest center. A popular heuristic for k-means clustering is Lloyd's

Tapas Kanungo; David M. Mount; Nathan S. Netanyahu; Christine D. Piatko; Ruth Silverman; Angela Y. Wu

2000-01-01

154

Local feature analysis based clustering algorithm with application to polymer model reduction

We are interested in model reduction for the dynamics of large scale systems that contain a collection of spatially oriented points, in particular, a polymer system. Local feature analysis (LFA) introduces a specific state reduction algorithm that offers a topographic representation. In this paper, we propose a new LFA based clustering algorithm for system model reduction with application to the

Yuzhen Xue; Pete J. Ludovice; Martha A. Grover

2010-01-01

155

The Research of ISOETRP Clustering Algorithm on Optical Mathematical Symbols Recognition

A mathematical symbol recognition approach based on ISOETRP Clustering Algorithm is proposed in this paper. Mathematical symbol is one kind of special image symbols, different from ordinary text character in several aspects. For this point, firstly, one special features extraction algorithm is proposed, then two kinds of Classifier are used, one is Minimum Distance Classifier, another is Tree Classifier based

Lihua Li; Rong Wang; Jintao Li; Ge Wang

2009-01-01

156

Performance impact of dynamic parallelism on different clustering algorithms

@udel.edu ABSTRACT In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version. Clustering refers to taking a large set of objects or elements and organizing them into collections based elements in space. In our work, we focus on two types of clustering: K-means and divisive hierarchical

Taufer, Michela

157

A novel word clustering algorithm based on latent semantic analysis

A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected

Jerome R. Bellegarda; John W. Butzberger; Yen-Lu Chow; Noah B. Coccaro; Devang Naik

1996-01-01

158

Software Clustering Techniques and the Use of Combined Algorithm

As the age of software systems increases they tend to deviate from their actual design and architecture. It becomes more and more difficult to manage and maintain such systems. We explore the idea of software clustering for reverse engineering and re-modularization. Clustering together software artifacts provides an automatic technique for discovering high level abstract entities within a system. Previous work

M. Saeed; Onaiza Maqbool; Haroon A. Babri; Syed Zahoor Hassan; S. Mansoor Sarwar

2003-01-01

159

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

Ju, Chunhua; Xu, Chonghuan

2013-01-01

160

A hybrid algorithm for clustering of time series data based on affinity search technique.

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

161

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

Ju, Chunhua

2013-01-01

162

An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.

Uy, D.L.

1996-02-01

163

A fast density-based clustering algorithm for real-time Internet of Things stream.

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

164

NASA Astrophysics Data System (ADS)

The watershed-clustering algorithm was adapted for use in multi-dimentional spectral space and was used to define clusters in Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. This algorithm identifies clusters as peaks in a B-dimensional topographic relief, where B is the number of wavelength bands. Image pixel spectra are represented as points in this multi-dimensional space. Analysis is done at increasing values of radiometric resolution, defined by the number of segments into which each wavelength axis is divided. Segmentation of the axes divides the multi-dimensional space into bins, and the number pixels in each bin is determined. The histogram of the bin populations defines the topography for the watershed analysis. Spectral clusters correspond to mountains or islands on this multi-dimensional surface. The algorithm is analogous to submerging this topography under water, and revealing clusters by determining when mountain peaks appear as the water surface is lowered. Testing of this algorithm reveals some surprising features. Although increasing the radiometric resolution (bins per axis) generally results in large clusters breaking up into greater numbers of small clusters., this is not always the case. Under some circumstances, the separate clusters can recombine into one large cluster when radiometric resolution is increased. This behavior is caused by the existence of single-pixel voxels, which smooths out the topography, and by the fact that the voxels retain a surprising degree of connectivity, even at high radiometric resolutions. These characteristics of the high-dimensional spectral data provide the basis for further development of the watershed algorithm.

Jellison, Gerard P.; Hemmer, Terrence H.; Wilson, Darryl G.

2002-08-01

165

A Novel Coverage-Preserving Clustering Algorithm for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Sensing coverage is one of the crucial characteristics for wireless sensor networks. It has to be considered in the design of routing protocols. LEACH (Low Energy Adaptive Cluster Hierarchy) is a significant and representative routing protocol which organizes the sensing nodes by clustering. For LEACH, residual energy should be considered in order to overcome the inequality of energy dissipation rate. Considering the impact on these two factors of a network, we have proposed a coverage-preserving energy-based clustering algorithm (CEC), which is an improved LEACH. Through improving the threshold for cluster-head selection, CEC achieved more effective results than the other baseline protocols.

Di, Xin

166

Classification of Text Data by Fuzzy Self-Constructing Feature Clustering Algorithm

Abstract — Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-based selfconstructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods.

Y. Ratna Kumari; R. Siva Ranjani

167

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982)

2002-01-01

168

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Pradipta Maji of brain MR images. The RFCM algorithm comprises a judicious integration of the of rough sets, fuzzy sets with vagueness and incompleteness in class definition of brain MR images, the membership function of fuzzy sets

Pal, Sankar Kumar

169

Text Clustering using a WordNet-based Knowledge-Base and the Lesk Algorithm

In this paper we are proposing a text clustering method based on a well-known Word Sense Disambiguation (WSD) algorithm, the Lesk algorithm, to classify textual data by doing highly accurate Word Sense Disambiguation. The clustering of text data is thus primarily based on the context or meaning of the words used for clustering. The Lesk algorithm is used to return the sense identifiers for the words used to classify the text files by looking up the senses of a word in a Knowledge-Base similar to the English WordNet (enriched with more informative columns or fields for each synset [synonym set] of the English WordNet database), so as to greatly increase the chances of contextual overlap, thereby resulting in high accuracy of proper sense or context identification of the words. The proposed scheme has been tested on a number of heterogeneous text document datasets. The clustering results and accuracies, obtained using the proposed scheme, have been compared with the results obtained using the K-means clustering algorithm on the Vector Space Models generated for all the heterogeneous textual datasets. Experimental results show that our algorithm performs much better than the Vector Space Model (VSM) and K-means based approach. The technique will thus help the users much better in searching for meaningful contextual information from a highly diversified collection of textual information, which is a key task of the information overload problem.

Jyotirmayee Choudhury; Deepesh Kumar Kimtani; Alok Chakrabarty

170

Fire Detection with Video Using Fuzzy c-Means and Back-Propagation Neural Network

\\u000a In this paper, we propose an effective method that detects fire automatically. The proposed algorithm is composed of four\\u000a stages. In the first stage, an approximate median method is used to detect moving regions. In the second stage, a fuzzy c-means\\u000a (FCM) algorithm based on the color of fire is used to select candidate fire regions from these moving regions.

Tung Xuan Truong; Jong-Myon Kim

2011-01-01

171

Fuzzy Variant of Affinity Propagation in Comparison to Median Fuzzy c-Means

In this paper we extend the crisp Affinity Propagation (AP) cluster algorithm to a fuzzy variant. AP is a message passing\\u000a algorithm based on the max-sum-algorithm optimization for factor graphs. Thus it is applicable also for data sets with only\\u000a dissimilarities known, which may be asymmetric. The proposed Fuzzy Affinity Propagation algorithm (FAP) returns fuzzy assignments\\u000a to the cluster prototypes

Tina Geweniger; D. Zühlke; Barbara Hammer; Thomas Villmann

2009-01-01

172

Minimum Energy Structures of Ni, Au and NiAl Clusters: A Genetic Algorithm

The lowest energy structures of (Ni)_n, (Au)n and (NiAl)_2n (with n up to 100) clusters were obtained through a genetic algorithm using the semi-empirical many-body Gupta potential to mimic the interatomic interaction. A variety of structure types are observed in all the three class of clusters, repeatedly appering the icosahedral structural motivs. Global minima are generally more difficult to find

Alvaro Posada-Amarillas; Roy L. Johnston; Lesley Lloyd; Thomas Mortimer-Jones; Oliver Paz-Borbón

2003-01-01

173

This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult

Wei Song; Cheng Hua Li; Soon Cheol Park

2009-01-01

174

Clicks: An effective algorithm for mining subspace clusters in categorical datasets

ABSTRACT We present a novel algorithm called Clicks, that flnds clusters in categorical datasets based on a search for k-partite maximal cliques. Unlike previous methods, Clicks mines subspace clusters. It uses a selective vertical method to guarantee complete search. Clicks outperforms previous approaches by over an order of magnitude,and scales better than any of the existing method for high-dimensional datasets.

Mohammed J. Zaki; Markus Peters; Ira Assent; Thomas Seidl

2007-01-01

175

A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm

NASA Astrophysics Data System (ADS)

We present a new cluster detection algorithm designed for finding high-redshift clusters using optical/infrared imaging data. The algorithm has two main characteristics. First, it utilizes each galaxy's full redshift probability function, instead of an estimate of the photometric redshift based on the peak of the probability function and an associated Gaussian error. Second, it identifies cluster candidates through cross-checking the results of two substantially different selection techniques (the name 2TecX representing the cross-check of the two techniques). These are adaptations of the Voronoi Tesselations and Friends-Of-Friends methods. Monte Carlo simulations of mock catalogues show that cross-checking the cluster candidates found by the two techniques significantly reduces the detection of spurious sources. Furthermore, we examine the selection effects and relative strengths and weaknesses of either method. The simulations also allow us to fine-tune the algorithm's parameters, and define completeness and mass limit as a function of redshift. We demonstrate that the algorithm isolates high-redshift clusters at a high level of efficiency and low contamination.

van Breukelen, Caroline; Clewley, Lee

2009-06-01

176

A Fast Clustering Algorithm with Application to Cosmology

are the connected components of a level set Sc {f > c} where f is the probability density function. We use kernel > c} where f is a nonparametric density estimator. For example, kernel density estimators the subset of data belonging to the level set and then find clusters by agglomerating the data 2 #12;points

177

Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces

One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data

Allan Borodin; Rafail Ostrovsky; Yuval Rabani

2004-01-01

178

Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

In traditional approaches for clustering market basket type data, relations among transactions are modeled according to the items occurring in these transactions. However, an individual item might induce different relations in different contexts. Since such contexts might be captured by interesting patterns in the overall data, we represent each transaction as a set of patterns through modifying the conventional pattern

Muhammet Mustafa Ozdal; Cevdet Aykanat

2004-01-01

179

Point Cloud Simplification Based on an Affinity Propagation Clustering Algorithm

Point cloud simplification is an important step in reverse engineering and computer vision. Nowadays many researchers are directly working on point sets other than polygonal meshes, while some nasty problems still exist, such as time cost, memory cost and accuracy. This paper proposes a novel method for point cloud simplification by integrating both re-sampling and Affinity Propagation Clustering. The advantage

Lanlan Li; S. Y. Chen; Qiu Guan; Xiaoyan Du; Z. Z. Hu

2009-01-01

180

Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905

Lee, Chongdeuk; Jeong, Taegwon

2011-01-01

181

An Auto-Recognizing System for Dice Games Using a Modified Unsupervised Grey Clustering Algorithm

In this paper, a novel identification method based on a machine vision system is proposed to recognize the score of dice. The system employs image processing techniques, and the modified unsupervised grey clustering algorithm (MUGCA) to estimate the location of each die and identify the spot number accurately and effectively. The proposed algorithms are substituted for manual recognition. From the experimental results, it is found that this system is excellent due to its good capabilities which include flexibility, high speed, and high accuracy.

Huang, Kuo-Yi

2008-01-01

182

A Fuzzy Adaptive Request Distribution Algorithm for Cluster-based Web Systems

This paper presents a novel algorithm for distribution of user requests sent to a Web-server cluster driven by a Web switch. Our algorithm called FARD (fuzzy adaptive request distribution) is a client-and-server-aware, dynamic and adaptive dispatching policy. It assigns each incoming request to the server with the least expected response time, estimated for that individual request. To estimate the expected

Leszek Borzemski; Krzysztof Zatwarnicki

2003-01-01

183

A Clustering Based Niching Method for Evolutionary Algorithms

the biological concept of species in separate ecological niches to EA to preserve diversity. To model species we-valued two-dimensional test functions [3]. The performance is measured by the number of optima each algorithm evaluation than the ES based methods. It shows also that the MS-HC performs well on these simple test

Zell, Andreas

184

A Hierarchical Clustering Algorithm Based on the Hungarian Method

the pairwise distance information, d i,j , into an affinity matrix W whose entries are given by W i,j = # e -d in [12] is based on the loopy BeliefÂPropagation algorithm that is applied on the affini

Tassa, Tamir

185

A Distributed Algorithm for Content-Aware Web Server Clusters

While content-aware distribution policies are getting more popular in cluster-based web systems, they make the dispatching node a bottleneck. To address the scalability and fault-tolerance problem, issues about designing distributed dispatching policies are discussed. For the policies aiming at improving the cache hit rate, a distributed dispatching policy named DWARD (distributed workload-aware request distribution) that takes into account both the

DU Zeng-Kai; ZHENG Ming-Yang; JU Jiu-Bin

186

Subquadratic approximation algorithms for clustering problems in high dimensional spaces

One of the central problems in information retrieval,data mining, computational biology, statistical analysis,computer vision, geographic analysis, pattern recognition,distributed protocols is the question of classificationof data according to some clustering rule. Oftenthe data is noisy and even approximate classificationis of extreme importance. The difficulty of suchclassification stems from the fact that usually the datahas many incomparable attributes, and often results inthe...

Allan Borodin; Rafail Ostrovskyt; Yuval Rabanit

1999-01-01

187

Lowest Weight: Reactive Clustering Algorithm for Adhoc Networks

In this paper, we address clustering in ad hoc networks. Ad hoc networks are a wireless networking paradigm in which mobile\\u000a hosts rely on each other to keep the network connected without the help of any pre-existing infrastructure or central administrator.\\u000a Thus, additional features pertinent to this type of networks appeared. In fact, centralized solutions are generally inadaptable\\u000a due to

Mohamed Elhoucine Elhdhili; Lamia Ben Azzouz; Farouk Kamoun

2006-01-01

188

Ontology Employment in Text Document Clustering combined with Grouping Algorithm

Incorporating semantic knowledge from ontology into text document clustering is an important but challenging problem. Moreover, there are many of computer science and medical based subject related papers and journals cited on the Internet. The purpose of this system is to cluster the documents based upon the statistical method and from the semantic web point of view, the system advances in the field of scientific endeavor. Moreover this system is the advanced and extended version of the paper we have been published before. After time passed the testing data amount becomes lager and lager and we have been found that our previous methods should have to improve in more mathematically. Finally, it also reports on the experiments that performed to test the system utilization weighting scheme which is used to encode the importance of concepts inside documents. For the experiments the system has to use ontology that enables us to describe and organize this from heterogeneous sources, and to cluster about it. The experiments reveal that even the testing documents increased; the system may actually be able to produce useful results.

Hmway Hmway Tar; Ayetharyar Taunggyi; Pye Phyo Oo

189

A cluster finding algorithm based on the multiband identification of red sequence galaxies

NASA Astrophysics Data System (ADS)

We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ˜11 960 deg2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71 743 clusters in the redshift range 0.1 < z < 0.6 with richness after correcting for the incompleteness of the richness estimate greater than 20. We cross-match the cluster catalogue with external cluster catalogues to find that our photometric cluster redshift estimates are accurate with low bias and scatter, and that the corrected richness correlates well with X-ray luminosities and temperatures. We use the publicly available Canada-France-Hawaii Telescope Lensing Survey shear catalogue to calibrate the mass-richness relation from stacked weak lensing analysis. Stacked weak lensing signals are detected significantly for eight subsamples of the SDSS clusters divided by redshift and richness bins, which are then compared with model predictions including miscentring effects to constrain mean halo masses of individual bins. We find the richness correlates well with the halo mass, such that the corrected richness limit of 20 corresponds to the cluster virial mass limit of about 1 × 1014 h-1 M? for the SDSS DR8 cluster sample.

Oguri, Masamune

2014-10-01

190

Clustering Algorithm Based on Self-Organizing Queue for Data Mining, Social Networks and

applications, social networks, such as Facebook and Twitter, machine learning and artificial intelligence characteristics, facilitating the analysis and seamless functioning of social networks (Facebook, Twitter, Linked that solves clustering problems. Researchers at the University of Florida developed the algorithm after making

Wu, Dapeng Oliver

191

The Application of a Simulated Annealing Fuzzy Clustering Algorithm for Cancer Diagnosis

of seven sets of FTIR spectra data which have been taken from three oral cancer patients. With no priorThe Application of a Simulated Annealing Fuzzy Clustering Algorithm for Cancer Diagnosis Xiao Ying Spectroscopy (FTIR) is becoming a powerful tool for use in the study of biomedical conditions, including cancer

Aickelin, Uwe

192

. The difference between these implementations stem from how the Hadoop and Granules runtimes (1) support recognition, identification of abnormal cell clusters for cancer detections, and bioinformatics among others that an algorithm will get stuck in local optima, never finding the optimal solution. Attempting to converge

Pallickara, Shrideep

193

Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow- topic document references and mix them into several

Stanislaw Osinski; Dawid Weiss

2004-01-01

194

Algorithms for Compiler-Assisted Design Space Exploration of Clustered VLIW ASIP Datapaths

Clustered Very Large Instruction Word Application-Specific Instruction Set Processors (VLIW ASIPs) combined with effective compilation techniques enable aggressive exploitation of the instruction level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance\\/cost tradeoffs. In this dissertation we propose and validate an algorithm to support early design space exploration (DSE) over classes of datapaths, in the

Viktor Lapinskii

2001-01-01

195

-matter atlas from a set of multi-subject diffusion weighted MR images. We formulate the atlas creation; Segmentation; Tractography; Diffusion imaging; White matter atlas 1 Introduction The human brainConsistency Clustering: A Robust Algorithm for Group-wise Registration, Segmentation and Automatic

196

We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.

Tame, M. S.; Kim, M. S. [QOLS, Blackett Laboratory, Imperial College London, Prince Consort Road, SW7 2BW, United Kingdom and Institute for Mathematical Sciences, Imperial College London, SW7 2PG (United Kingdom)

2010-09-15

197

In observational studies, unbalanced observed covariates between treatment groups often cause biased inferences on the estimation of treatment effects. Recently, generalized propensity score (GPS) has been proposed to overcome this problem; however, a practical technique to apply the GPS is lacking. This study demonstrates how clustering algorithms can be used to group similar subjects based on transformed GPS. We compare

Chunhao Tu; Shuo Jiao; Woon Yuen Koh

2012-01-01

198

Clustering online social network communities using genetic algorithms Mustafa H. Hajeer*

effectively, and socialize across borders and in general, maintain a second avatar in cyberspace. The way blog edge indicating individual node participating in more than one discussion groups or activitiesClustering online social network communities using genetic algorithms Mustafa H. Hajeer* Alka Singh

Sanyal, Sugata

199

Impact of Mobility Prediction on the Temporal Stability of MANET Clustering Algorithms*

Impact of Mobility Prediction on the Temporal Stability of MANET Clustering Algorithms* Aravindhan@iemail.tamu.edu ABSTRACT Scalability issues for routing in mobile ad hoc networks (MANETs) have been typically addressed schemes have been proposed to dynamically identify and maintain hierarchy in MANETs. To achieve

Gautam, Natarajan

200

Improved semidefinite branch-and-bound algorithm for k-cluster Nathan Krislock

Improved semidefinite branch-and-bound algorithm for k-cluster Nathan Krislock JÂ´er^ome Malick Fr to be effective for Max-Cut may be conceptually easy, INRIA Grenoble Rh^one-Alpes, nathan.krislock@inria.fr CNRS

Paris-Sud XI, UniversitÃ© de

201

A cluster finding algorithm based on the multi-band identification of red-sequence galaxies

We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red-sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ~11960 deg^2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71743 clusters in the redshift range 0.1

Oguri, Masamune

2014-01-01

202

NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION

In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625

WANG, QINGGUO; SHANG, CHARLES; XU, DONG

2014-01-01

203

A new AntTree-based algorithm for clustering short-text corpora Marcelo Luis Errecalde, Diego Valencia Valencia, Spain prosso@dsic.upv.es Abstract Research work on "short-text clustering" is a very: Short-text clustering, Bio-inspired algo- rithms, AntTree, Internal Validity Measures, Silhou- ette

Rosso, Paolo

204

MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

Melnykov, Volodymyr [University of Alabama, Tuscaloosa; Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

2012-01-01

205

BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster

NASA Astrophysics Data System (ADS)

This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.

Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi

2007-12-01

206

In this paper, we propose a self-adaptive migration rule for macro-micro evolutionary algorithm which was proposed to find several local optima for multi-model optimization problems. The algorithm consists of two evolutionary algorithms which control global species and local individuals respectively. To keep the diversity explicitly, we incorporate a clustering method to divide individuals to several species. Clustering method based on

Sang-Keon Oh; Min-Soeng Kim; Ju-Jang Lee

2001-01-01

207

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But

Anindya Bhattacharya; Rajat K. De

2010-01-01

208

A modified ant-based text clustering algorithm with semantic similarity measure

Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve\\u000a the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure\\u000a is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity\\u000a between documents. On the other, the ant behavior model

Haoxiang Xia; Shuguang Wang; Taketoshi Yoshida

2006-01-01

209

A fast hierarchical clustering algorithm for large-scale protein sequence data sets.

TRIBE-MCL is a Markov clustering algorithm that operates on a graph built from pairwise similarity information of the input data. Edge weights stored in the stochastic similarity matrix are alternately fed to the two main operations, inflation and expansion, and are normalized in each main loop to maintain the probabilistic constraint. In this paper we propose an efficient implementation of the TRIBE-MCL clustering algorithm, suitable for fast and accurate grouping of protein sequences. A modified sparse matrix structure is introduced that can efficiently handle most operations of the main loop. Taking advantage of the symmetry of the similarity matrix, a fast matrix squaring formula is also introduced to facilitate the time consuming expansion. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution improves execution speed by two orders of magnitude, compared to recently published efficient solutions, reducing the total runtime well below 1min in the case of the 11,944proteins of SCOP95. This improvement in computation time is reached without losing anything from the partition quality. Convergence is generally reached in approximately 50 iterations. The efficient execution enabled us to perform a thorough evaluation of classification results and to formulate recommendations regarding the choice of the algorithm?s parameter values. PMID:24657908

Szilágyi, Sándor M; Szilágyi, László

2014-05-01

210

Modified Fuzzy-CMAC Networks with Clustering-based Structure

This work proposes a modified structure for the fuzzy-CMAC network to solve the curse of dimensionality problem, observed when systems with a high number of involved variables are being modeled with the aid of computational intelligence techniques. The approach is based on the fuzzy C-means clustering algorithm, which is used here to initialize the CMAC fuzzy input partitions. Also, the

Geraldo Souza Reis Jr.; Paulo E. M. Almeida

2006-01-01

211

Dynamic connectivity algorithms for Monte Carlo simulations of the random-cluster model

NASA Astrophysics Data System (ADS)

We review Sweeny's algorithm for Monte Carlo simulations of the random cluster model. Straightforward implementations suffer from the problem of computational critical slowing down, where the computational effort per edge operation scales with a power of the system size. By using a tailored dynamic connectivity algorithm we are able to perform all operations with a poly-logarithmic computational effort. This approach is shown to be efficient in keeping online connectivity information and is of use for a number of applications also beyond cluster-update simulations, for instance in monitoring droplet shape transitions. As the handling of the relevant data structures is non-trivial, we provide a Python module with a full implementation for future reference.

Metin Elçi, Eren; Weigel, Martin

2014-05-01

212

A New Waveform Signal Processing Method Based on Adaptive Clustering-Genetic Algorithms

We present a fast digital signal processing method for numerical analysis of individual pulses from CdZnTe compound semiconductor detectors. Using Maxi-Mini Distance Algorithm and Genetic Algorithms based discrimination technique. A parametric approach has been used for classifying the discriminated waveforms into a set of clusters each has a similar signal shape with a corresponding pulse height spectrum. A corrected total pulse height spectrum was obtained by applying a normalization factor for the full energy peak for each cluster with a highly improvements in the energy spectrum characteristics. This method applied successfully for both simulated and real measured data, it can be applied to any detector suffers from signal shape variation. (authors)

Noha Shaaban; Fukuzo Masuda; Hidetsugu Morota [Computer Software Development Company, Ltd. (Japan)

2006-07-01

213

The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965

Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.

2014-01-01

214

A Clustering Algorithm Based on the Ants Self-Assembly Behavior

\\u000a We have presented in this paper an ants based clustering algorithm which is inspired from the self-assembling behavior observed\\u000a in real ants. These ants progressively become connected to an initial point called the support and then successively to other\\u000a connected ants. The artificial ants that we have defined similarly build a tree where each ant represents a node\\/data. Ants\\u000a use

Hanene Azzag; Nicolas Monmarché; Mohamed Slimane; Christiane Guinot; Gilles Venturini

2003-01-01

215

Crowded Cluster Cores: Algorithms for Deblending in Dark Energy Survey Images

Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. We present new software called the Gradient And INterpolation based deblender (GAIN) as a secondary deblender to improve deblending the images of cluster cores. This software relies on using image intensity gradient and using an image interpolation technique usually used to correct flawed terrestrial digital images. We test this software on Dark Energy Survey coadd images. GAIN helps extracting unbiased photometry measurement for blended sources. It also helps improving detection completeness while introducing only a modest amount of spurious detections. For example, when applied to deep images simulated with high level o...

Zhang, Yuanyuan; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J; Rykoff, Eli; Song, Jeeseon

2014-01-01

216

NASA Astrophysics Data System (ADS)

In many applications of remotely-sensed imagery, one of the first steps is partitioning the image into a tractable number of regions. In spectral remote sensing, the goal is often to find regions that are spectrally similar within the region but spectrally distinct from other regions. There is often no requirement that these region be spatially connected. Two goals of this study are to partition a hyperspectral image into groups of spectrally distinct materials, and to partition without human intervention. To this end, this study investigates the use of multi- resolution, multi-dimensional variants of the watershed- clustering algorithm on Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. The watershed algorithm looks for clusters in a histogram: a B-dimensional surface where B is the number of bands used (up to 210 for HYDICE). The algorithm is applied to HYDICE data of the Purdue Agronomy Farm, for which ground truth is available. Watershed results are compared to those obtained by using the commonly-available Iterative Self-Organizing Data Analysis Technique (ISODATA) algorithm.

Hemmer, Terrence H.; Jellison, Gerard P.; Wilson, Darryl G.

2002-08-01

217

Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.

In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies. PMID:17282575

Rong, Zhou; Tianyu, Ma; Yongjie, Jin

2005-01-01

218

The existing RPCCL (rival penalization controlled competitive learning) algorithm has provide an attractive way to perform data clustering. However its performance is sensitive to the selection of the initial cluster center. In this paper, we further investigate the RPCCL and present an improved approach of seed point selection which chooses non-neighbor data points of the greatest local density as seed

Xuefeng Liu; Guangrong Ji; Wencang Zhao; Junna Cheng

2007-01-01

219

NASA Astrophysics Data System (ADS)

Web cluster systems consist of a load balancer for distributing web requests and loads to several servers, and real servers for processing web requests. Previous load distribution scheduling algorithms of web cluster systems to distribute web requests to real servers are Round-Robin, Weighted Round-Robin, Least-Connection and Weighted Least-Connection(WLC) algorithm. The WLC scheduling algorithm, in which a throughput weight is assigned to real servers and the least connected real server is selected for processing web requests, is generally used for web cluster systems. When a new real server is added to a web cluster system with many simultaneous users, previous WLC scheduling algorithm assigns web requests to only the new real server, and makes load imbalance among real servers. In this paper, we propose a improved WLC scheduling algorithm which maintains load balance among real servers by avoiding web requests being assigned to only a new real server. When web requests are continuously assigned to only a new real server more than the maximum continuous allocation number(L), the proposed algorithm excepts the new real server from activated real server scheduling list and deactivates the new real server. And after L-1 allocation round times, the new real server is included into real server scheduling list by activating it. When a new real server is added to web cluster systems, the proposed algorithm maintains load balance among real servers by avoiding overloads of the new real server.

Choi, Dongjun; Chung, Kwang Sik; Shon, Jingon

220

Possibilistic clustering for shape recognition

NASA Technical Reports Server (NTRS)

Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.

Keller, James M.; Krishnapuram, Raghu

1993-01-01

221

NASA Astrophysics Data System (ADS)

The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H 2O) n (2 ? n ? 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT_RM) and non rigid (DFT_NRM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT_RM and DFT_NRM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H 2O) n (2 ? n ? 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol -1 per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner.

de Abreu e Silva, Elcio Sabato; Duarte, Hélio Anderson; Belchior, Jadson Cláudio

2006-04-01

222

Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm

NASA Technical Reports Server (NTRS)

Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.

Mitra, Sunanda; Pemmaraju, Surya

1992-01-01

223

Using clustering and a modified classification algorithm for automatic text summarization

NASA Astrophysics Data System (ADS)

In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

2013-01-01

224

a Multi-Core Fpga-Based 2D-CLUSTERING Algorithm for High-Throughput Data Intensive Applications

NASA Astrophysics Data System (ADS)

A multi-core FPGA-based clustering algorithm for high-throughput data intensive applications is presented. The algorithm is optimized for data with two dimensional organization (e.g. image processing, pixel detectors for high energy physics experiments etc.). It uses a moving window of generic size to adjust to the application's processing requirements (the cluster sizes and shapes that appear in the input data sets). One or more windows (cores) can be used to identify clusters in parallel, allowing for versatility to increase performance or reduce the amount of used resources. In addition to the inherent parallelism the algorithm is executed in a pipeline, thus allowing for readout to be performed in parallel with the cluster identification.

Sotiropoulou, Calliope-Louisa; Nikolaidis, Spyridon; Annovi, Alberto; Beretta, Matteo; Volpi, Guido; Giannetti, Paola; Luciano, Pierluigi

2014-06-01

225

MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.

Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776

Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu

2013-09-01

226

Applying Social Networking and Clustering Algorithms to Galaxy Groups in ALFALFA

NASA Astrophysics Data System (ADS)

Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clustering algorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.

Bramson, Ali; Wilcots, E. M.

2012-01-01

227

Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix

NASA Technical Reports Server (NTRS)

Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.

Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

2006-01-01

228

Single-Parent Evolution Algorithm and the Optimization of Si Clusters

NASA Astrophysics Data System (ADS)

We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.

Rata, Ionel; Shvartsburg, Alexandre A.; Horoi, Mihai; Frauenheim, Thomas; Siu, K. W. Michael; Jackson, Koblar A.

2000-07-01

229

Two Algorithms for Orthogonal Nonnegative Matrix Factorization with Application to Clustering

Approximate matrix factorization techniques with both nonnegativity and orthogonality constraints, referred to as orthogonal nonnegative matrix factorization (ONMF), have been recently introduced and shown to work remarkably well for clustering tasks such as document classification. In this paper, we introduce two new methods to solve ONMF. First, we show mathematical equivalence between ONMF and a weighted variant of spherical $k$-means, from which we derive our first method, a simple EM-like algorithm. Our second method is based on an augmented Lagrangian approach. Standard ONMF algorithms typically enforce nonnegativity for their iterates while trying to achieve orthogonality at the limit (e.g., using a proper penalization term or a suitably chosen search direction). Our method works the opposite way: orthogonality is strictly imposed at each step while nonnegativity is asymptotically obtained, using a quadratic penalty. Finally, we show that the two proposed approaches compare favorably with standard ONMF...

Pompili, Filippo; Absil, P -A; Glineur, François

2012-01-01

230

Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

NASA Astrophysics Data System (ADS)

A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.

Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

231

An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach. PMID:21714260

Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

2011-04-01

232

Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm

NASA Astrophysics Data System (ADS)

Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.

Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel

2014-05-01

233

Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492

Fong, Simon

2012-01-01

234

Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492

Fong, Simon

2012-01-01

235

Cloud classification from satellite data using a fuzzy sets algorithm: A polar example

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1988-01-01

236

Since cellular functionality is typically envisioned as having a hierarchical structure, we propose a framework to identify modules (or clusters) within protein-protein interaction (PPI) networks in this paper. Based on the within-module and between-module edges of subgraphs and degree distribution, we present a formal module definition in PPI networks. Using the new module definition, an effective quantitative measure is introduced for the evaluation of the partition of PPI networks. Because of the hierarchical nature of functional modules, a hierarchical agglomerative clustering algorithm is developed based on the new measure in order to solve the problem of complexes detection within PPI networks. We use gold standard sets of protein complexes to validate the biological significance of predicted complexes. A comprehensive comparison is performed between our method and other four representative methods. The results show that our algorithm finds more protein complexes with high biological significance and a significant improvement. Furthermore, the predicted complexes by our method, whether dense or sparse, match well with known biological characteristics. PMID:22000801

Yu, Liang; Gao, Lin; Li, Kui; Zhao, Yi; Chiu, David K Y

2011-10-12

237

Spatio-Temporal Image Segmentation Using Optical Flow and Clustering Algorithm

Image segmentation is an important and challenging problem in image analysis. Segmentation of moving objects in image sequences is even more difficult and computationally expensive. In this work we propose a technique for spatio-temporal segmentation of medical image sequences based on clustering in the feature vector space. The motivation for spatio-temporal approach is the fact that motion is a useful clue for object segmentation. Two- dimensional feature vector has been used for clustering in the feature space. The first feature is image brightness which reveals the structure of interest in the image. The second feature is the Euclidean norm of the optical flow vector. The optical flow field is computed using a Horn-Schunck algorithm. By clustering in the feature space, it is possible to detect a moving object in the image. Experiments have been conducted using a sequence of ECG-gated magnetic resonance (MR) images of a beating heart. The method is also tested on images with moving ...

Sasa Galic; Sven Loncaric; Ericsson Nikola Tesla

238

`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny

NASA Astrophysics Data System (ADS)

Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.

Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

2010-10-01

239

One of the most exciting prospects for the Laser Interferometer Space Antenna (LISA) is the detection of gravitational waves from the inspirals of stellar-mass compact objects into supermassive black holes. Detection of these sources is an extremely challenging computational problem due to the large parameter space and low amplitude of the signals. However, recent work has suggested that the nearest extreme mass ratio inspiral (EMRI) events will be sufficiently loud that they might be detected using computationally cheap, template-free techniques, such as a time-frequency analysis. In this paper, we examine a particular time-frequency algorithm, the Hierarchical Algorithm for Clusters and Ridges (HACR). This algorithm searches for clusters in a power map and uses the properties of those clusters to identify signals in the data. We find that HACR applied to the raw spectrogram performs poorly, but when the data is binned during the construction of the spectrogram, the algorithm can detect typical EMRI events at distances of up to $\\sim2.6$Gpc. This is a little further than the simple Excess Power method that has been considered previously. We discuss the HACR algorithm, including tuning for single and multiple sources, and illustrate its performance for detection of typical EMRI events, and other likely LISA sources, such as white dwarf binaries and supermassive black hole mergers. We also discuss how HACR cluster properties could be used for parameter extraction.

Jonathan R Gair; Gareth Jones

2006-10-10

240

Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough. PMID:25204134

Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing

2014-02-01

241

Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.

In cancer biology, it is very important to understand the phenotypic changes of the patients and discover new cancer subtypes. Recently, microarray-based technologies have shed light on this problem based on gene expression profiles which may contain outliers due to either chemical or electrical reasons. These undiscovered subtypes may be heterogeneous with respect to underlying networks or pathways, and are related with only a few of interdependent biomarkers. This motivates a need for the robust gene expression-based methods capable of discovering such subtypes, elucidating the corresponding network structures and identifying cancer related biomarkers. This study proposes a penalized model-based Student's t clustering with unconstrained covariance (PMT-UC) to discover cancer subtypes with cluster-specific networks, taking gene dependencies into account and having robustness against outliers. Meanwhile, biomarker identification and network reconstruction are achieved by imposing an adaptive [Formula: see text] penalty on the means and the inverse scale matrices. The model is fitted via the expectation maximization algorithm utilizing the graphical lasso. Here, a network-based gene selection criterion that identifies biomarkers not as individual genes but as subnetworks is applied. This allows us to implicate low discriminative biomarkers which play a central role in the subnetwork by interconnecting many differentially expressed genes, or have cluster-specific underlying network structures. Experiment results on simulated datasets and one available cancer dataset attest to the effectiveness, robustness of PMT-UC in cancer subtype discovering. Moveover, PMT-UC has the ability to select cancer related biomarkers which have been verified in biochemical or biomedical research and learn the biological significant correlation among genes. PMID:23799085

Wu, Meng-Yun; Dai, Dao-Qing; Zhang, Xiao-Fei; Zhu, Yuan

2013-01-01

242

Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. PMID:24596179

Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi

2014-09-01

243

, social networks, such as Facebook and Twitter, machine learning and artificial intelligence. Computer characteristics, facilitating the analysis and seamless functioning of social networks (Facebook, Twitter, Linked that solves clustering problems. Researchers at the University of Florida developed the algorithm after making

Wu, Dapeng Oliver

244

An Optimized Approach on Applying Genetic Algorithm to Adaptive Cluster Validity Index

The partitioning or clustering method is an important research branch in data mining area, and it partitions the dataset into an arbitrary number k of clusters according to the correlation attribute of all elements of the dataset. Most datasets have the original clusters number, which is estimated with cluster validity index. But most current cluster validity index methods give the

Lei Sun; Tzu-Chieh Lin; Hsiang-Cheh Huang; Bin-Yih Liao; Jeng-Shyang Pan

2007-01-01

245

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques

L. O. Hall; A. M. Bensaid; L. P. Clarke; R. P. Velthuizen; M. S. Silbiger; J. C. Bezdek

1992-01-01

246

Research of Web Transactions Clustering Analysis Based on Ant-Colony Algorithm

This paper discusses the two important phases, which are data preprocessing and clustering analysis, in Web transactions clustering analysis, in order to gain an easily interpreted clustering result, we introduce the \\

Kejun Zhang; Rong Qian; Xiaokun Zhang; Zhixiang Zhu; Geng Zhao

2009-01-01

247

NASA Astrophysics Data System (ADS)

Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.

Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.

2014-03-01

248

Possibilistic clustering for shape recognition

NASA Technical Reports Server (NTRS)

Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

Keller, James M.; Krishnapuram, Raghu

1992-01-01

249

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. PMID:20144735

Bhattacharya, Anindya; De, Rajat K

2010-08-01

250

An algorithm for identifying clusters of functionally related genes in genomes

An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromo...

Yi, Gang Man

2009-05-15

251

Applications of a new subspace clustering algorithm (COSA) in medical systems biology

A novel clustering approach named Clustering Objects on Subsets of Attributes (COSA) has been proposed (Friedman and Meulman,\\u000a (2004). Clustering objects on subsets of attributes. J. R. Statist. Soc. B 66, 1–25.) for unsupervised analysis of complex data sets. We demonstrate its usefulness in medical systems biology studies.\\u000a Examples of metabolomics analyses are described as well as the unsupervised clustering

Doris Damian; Matej Oreši?; Elwin Verheij; Jacqueline Meulman; Jerome Friedman; Aram Adourian; Nicole Morel; Age Smilde; Jan van der Greef

2007-01-01

252

BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data

Background The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard. Results Here we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs. Conclusions To our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at http://biclue.mpi-inf.mpg.de. PMID:24565035

2013-01-01

253

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939

Nagwani, Naresh Kumar; Deo, Shirish V.

2014-01-01

254

Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches. PMID:24091399

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

2013-01-01

255

Hybridization of evolutionary algorithms and local search by means of a clustering method

This paper presents a hybrid evolutionary algorithm (EA) to solve nonlinear-regression problems. Although EAs have proven their ability to explore large search spaces, they are comparatively inefficient in fine tuning the solution. This drawback is usually avoided by means of local optimization algorithms that are applied to the individuals of the population. The algorithms that use local optimization procedures are

Alfonso C. Martínez-Estudillo; César Hervás-Martínez; Francisco J. Martínez-Estudillo; Nicolás García-Pedrajas

2006-01-01

256

We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithm (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.

Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

2011-01-01

257

An on-demand weighted clustering algorithm (WCA) for ad hoc networks

We consider a multi-cluster, multi-hop packet radio network architecture for wireless systems which can dynamically adapt itself with the changing network configurations. Due to the dynamic nature of the mobile nodes, their association and dissociation to and from clusters perturb the stability of the system, and hence a reconfiguration of the system is unavoidable. At the same time it is

Mainak Chatterjee; S. K. Sas; Damla Turgut

2000-01-01

258

A neural network clustering algorithm for the ATLAS silicon pixel detector

NASA Astrophysics Data System (ADS)

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton-proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

The ATLAS collaboration

2014-09-01

259

A neural network clustering algorithm for the ATLAS silicon pixel detector

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton-proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

Aad, Georges; Abdallah, Jalal; Abdel Khalek, Samah; Abdinov, Ovsat; Aben, Rosemarie; Abi, Babak; Abolins, Maris; AbouZeid, Ossama; Abramowicz, Halina; Abreu, Henso; Abreu, Ricardo; Abulaiti, Yiming; Acharya, Bobby Samir; Adamczyk, Leszek; Adams, David; Adelman, Jahred; Adomeit, Stefanie; Adye, Tim; Agatonovic-Jovin, Tatjana; Aguilar-Saavedra, Juan Antonio; Agustoni, Marco; Ahlen, Steven; Ahmadov, Faig; Aielli, Giulio; Akerstedt, Henrik; Åkesson, Torsten Paul Ake; Akimoto, Ginga; Akimov, Andrei; Alberghi, Gian Luigi; Albert, Justin; Albrand, Solveig; Alconada Verzini, Maria Josefina; Aleksa, Martin; Aleksandrov, Igor; Alexa, Calin; Alexander, Gideon; Alexandre, Gauthier; Alexopoulos, Theodoros; Alhroob, Muhammad; Alimonti, Gianluca; Alio, Lion; Alison, John; Allbrooke, Benedict; Allison, Lee John; Allport, Phillip; Almond, John; Aloisio, Alberto; Alonso, Alejandro; Alonso, Francisco; Alpigiani, Cristiano; Altheimer, Andrew David; Alvarez Gonzalez, Barbara; Alviggi, Mariagrazia; Amako, Katsuya; Amaral Coutinho, Yara; Amelung, Christoph; Amidei, Dante; Amor Dos Santos, Susana Patricia; Amorim, Antonio; Amoroso, Simone; Amram, Nir; Amundsen, Glenn; Anastopoulos, Christos; Ancu, Lucian Stefan; Andari, Nansi; Andeen, Timothy; Anders, Christoph Falk; Anders, Gabriel; Anderson, Kelby; Andreazza, Attilio; Andrei, George Victor; Anduaga, Xabier; Angelidakis, Stylianos; Angelozzi, Ivan; Anger, Philipp; Angerami, Aaron; Anghinolfi, Francis; Anisenkov, Alexey; Anjos, Nuno; Annovi, Alberto; Antonaki, Ariadni; Antonelli, Mario; Antonov, Alexey; Antos, Jaroslav; Anulli, Fabio; Aoki, Masato; Aperio Bella, Ludovica; Apolle, Rudi; Arabidze, Giorgi; Aracena, Ignacio; Arai, Yasuo; Araque, Juan Pedro; Arce, Ayana; Arguin, Jean-Francois; Argyropoulos, Spyridon; Arik, Metin; Armbruster, Aaron James; Arnaez, Olivier; Arnal, Vanessa; Arnold, Hannah; Arratia, Miguel; Arslan, Ozan; Artamonov, Andrei; Artoni, Giacomo; Asai, Shoji; Asbah, Nedaa; Ashkenazi, Adi; Åsman, Barbro; Asquith, Lily; Assamagan, Ketevi; Astalos, Robert; Atkinson, Markus; Atlay, Naim Bora; Auerbach, Benjamin; Augsten, Kamil; Aurousseau, Mathieu; Avolio, Giuseppe; Azuelos, Georges; Azuma, Yuya; Baak, Max; Baas, Alessandra; Bacci, Cesare; Bachacou, Henri; Bachas, Konstantinos; Backes, Moritz; Backhaus, Malte; Backus Mayes, John; Badescu, Elisabeta; Bagiacchi, Paolo; Bagnaia, Paolo; Bai, Yu; Bain, Travis; Baines, John; Baker, Oliver Keith; Balek, Petr; Balli, Fabrice; Banas, Elzbieta; Banerjee, Swagato; Bannoura, Arwa A E; Bansal, Vikas; Bansil, Hardeep Singh; Barak, Liron; Baranov, Sergei; Barberio, Elisabetta Luigia; Barberis, Dario; Barbero, Marlon; Barillari, Teresa; Barisonzi, Marcello; Barklow, Timothy; Barlow, Nick; Barnett, Bruce; Barnett, Michael; Barnovska, Zuzana; Baroncelli, Antonio; Barone, Gaetano; Barr, Alan; Barreiro, Fernando; Barreiro Guimarães da Costa, João; Bartoldus, Rainer; Barton, Adam Edward; Bartos, Pavol; Bartsch, Valeria; Bassalat, Ahmed; Basye, Austin; Bates, Richard; Batkova, Lucia; Batley, Richard; Battaglia, Marco; Battistin, Michele; Bauer, Florian; Bawa, Harinder Singh; Beau, Tristan; Beauchemin, Pierre-Hugues; Beccherle, Roberto; Bechtle, Philip; Beck, Hans Peter; Becker, Anne Kathrin; Becker, Sebastian; Beckingham, Matthew; Becot, Cyril; Beddall, Andrew; Beddall, Ayda; Bedikian, Sourpouhi; Bednyakov, Vadim; Bee, Christopher; Beemster, Lars; Beermann, Thomas; Begel, Michael; Behr, Katharina; Belanger-Champagne, Camille; Bell, Paul; Bell, William; Bella, Gideon; Bellagamba, Lorenzo; Bellerive, Alain; Bellomo, Massimiliano; Belotskiy, Konstantin; Beltramello, Olga; Benary, Odette; Benchekroun, Driss; Bendtz, Katarina; Benekos, Nektarios; Benhammou, Yan; Benhar Noccioli, Eleonora; Benitez Garcia, Jorge-Armando; Benjamin, Douglas; Bensinger, James; Benslama, Kamal; Bentvelsen, Stan; Berge, David; Bergeaas Kuutmann, Elin; Berger, Nicolas; Berghaus, Frank; Beringer, Jürg; Bernard, Clare; Bernat, Pauline; Bernius, Catrin; Bernlochner, Florian Urs; Berry, Tracey; Berta, Peter; Bertella, Claudia; Bertoli, Gabriele; Bertolucci, Federico; Bertsche, David; Besana, Maria Ilaria; Besjes, Geert-Jan; Bessidskaia, Olga; Bessner, Martin Florian; Besson, Nathalie; Betancourt, Christopher; Bethke, Siegfried; Bhimji, Wahid; Bianchi, Riccardo-Maria; Bianchini, Louis; Bianco, Michele; Biebel, Otmar; Bieniek, Stephen Paul; Bierwagen, Katharina; Biesiada, Jed; Biglietti, Michela; Bilbao De Mendizabal, Javier; Bilokon, Halina; Bindi, Marcello; Binet, Sebastien; Bingul, Ahmet; Bini, Cesare; Black, Curtis; Black, James; Black, Kevin; Blackburn, Daniel

2014-01-01

260

A neural network clustering algorithm for the ATLAS silicon pixel detector

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton--proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

ATLAS collaboration

2014-06-30

261

Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220

Tsai, Ming-Hui; Huang, Yueh-Min

2014-01-01

262

NASA Astrophysics Data System (ADS)

We have explored recent developments in machine learning algorithms, such as diffusion mapping (Richards et al. 2009) which allow us to identify physically similar clusters independent of prior knowledge. We have successfully used this method to separate out different classes of X-ray binaries and of different spectral states within a given system. Beyond the immediate astronomical application, a strength of our approach is to offer new and useful insight into the vast and rapidly growing multi-dimensional data collections in essentially all fields of investigation, not only the astrophysical ones which form our testbed and the immediate focus of our scientific interest.

Vrtilek, Saeqa Dil; Boroson, Bram S.; Richards, Joseph

2014-06-01

263

A novel hierarchical image segmentation approach has been developed for the extraction of tongue carcinoma from magnetic resonance (MR) images. First, a genetic algorithm (GA)-induced fuzzy clustering is used for initial segmentation of MR images of head and neck. Then these segmented masses are refined to reduce the false-positives using an artificial neural network (ANN)-based symmetry detection and image analysis procedure. The proposed technique is applied to clinical MR images of tongue carcinoma and quantitative evaluations are performed. Experimental results suggest that the proposed approach provides an effective method for tongue carcinoma extraction with high accuracy and minimal user-dependency. PMID:17272055

Zhou, J; Krishnan, S; Chong, V; Huang, J

2004-01-01

264

NASA Astrophysics Data System (ADS)

This paper deals with the two level 0-1 programming problems in which there are two decision makers (DMs); the decision maker at the upper level and the decision maker at the lower level. The authors modify the problematic aspects of a computation method for the Stackelberg solution which they previously presented, and thus propose an improved computation method. Specifically, a genetic algorithm (GA) is proposed with the objective of boosting the accuracy of solutions while maintaining the diversity of the population, which adopts a clustering method instead of calculating distances during sharing. Also, in order to eliminate unnecessary computation, an additional algorithm is included for avoiding obtaining the rational reaction of the lower level DM in response to upper level DM's decisions when necessary. In order to verify the effectiveness of the proposed method, it is intended to make a comparison with existing methods by performing numerical experiments into both the accuracy of solutions and the computation time.

Niwa, Keiichi; Nishizaki, Ichiro; Sakawa, Masatoshi

2009-01-01

265

Solution of facility location problem in Turkey by using fuzzy C-means method

NASA Astrophysics Data System (ADS)

Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.

Kocakaya, Mustafa Nabi; Türkak?n, Osman Hürol

2013-10-01

266

A Comparative Study of Two Density-Based Spatial Clustering Algorithms for Very Large Datasets

is a database system for the management of spatial data. Rapid growth is occurring in the number and the size-based, and grid-based. Hierarchical clustering methods can be either agglomerative or divisive. An agglomerative

267

This thesis examines two methods for speeding up MCNP KCODE calculations. The first approach is assembly of a low cost Beowulf Cluster for parallel computation. The first half describes the MIT Nuclear Engineering Department's ...

Carstens, Nathan, 1978-

2004-01-01

268

Efficient Active Algorithms for Hierarchical Clustering Akshay Krishnamurthy akshaykr@cs.cmu.edu

on perfor- mance, measurement complexity and run- time complexity. We instantiate this frame- work with a simple spectral clustering al- gorithm and provide concrete results on its performance, showing that

Singh, Aarti

269

Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering

In this paper, we explore the power of randomized algorithm to address the chal- lenge of working with very large amounts of data. We apply these algorithms to gen- erate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the num- ber of elements to be computed.

Deepak Ravichandran; Patrick Pantel; Eduard Hovy

270

In this paper, we explore the power of randomized algorithm to address the chal- lenge of working with very large amounts of data. We apply these algorithms to gen- erate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the num- ber of elements to be computed.

Deepak Ravichandran; Patrick Pantel; Eduard H. Hovy

2005-01-01

271

The concept can be used to estimate future resource requirements and to perform call admission decisions in wireless networks. Shadow clusters can be used to decide if a new call can be admitted to a wireless network based on its quality-of-service (QoS) requirements and local traffic conditions. The shadow cluster concept can especially be useful in future wireless networks with

David A. Levine; Ian F. Akyildiz; Mahmoud Naghshineh

1997-01-01

272

The characterization and prediction of the structures of metal silicon clusters is important for nanotechnology research because these clusters can be used as building blocks for nano devices, integrated circuits and solar cells. Several authors have postulated that there is a transition between exo to endo absorption of Cu in Si(n) clusters and showed that for n larger than 9 it is possible to find endohedral clusters. Unfortunately, no global searchers have confirmed this observation, which is based on local optimizations of plausible structures. Here we use parallel Genetic Algorithms (GA), as implemented in our MGAC software, directly coupled with DFT energy calculations to show that the global search of CuSi(n) cluster structures does not find endohedral clusters for n < 8 but finds them for n ? 10. PMID:21785526

Oña, Ofelia B; Ferraro, Marta B; Facelli, Julio C

2011-01-01

273

NASA Astrophysics Data System (ADS)

The problems with binary watermarking schemes are that they have only a small amount of embeddable space and are not robust enough. We develop a slice-based large-cluster algorithm (SBLCA) to construct a robust watermarking scheme for binary images. In SBLCA, a small-amount cluster selection (SACS) strategy is used to search for a feasible slice in a large-cluster flappable-pixel decision (LCFPD) method, which is used to search for the best location for concealing a secret bit from a selected slice. This method has four major advantages over the others: (a) SBLCA has a simple and effective decision function to select appropriate concealment locations, (b) SBLCA utilizes a blind watermarking scheme without the original image in the watermark extracting process, (c) SBLCA uses slice-based shuffling capability to transfer the regular image into a hash state without remembering the state before shuffling, and finally, (d) SBLCA has enough embeddable space that every 64 pixels could accommodate a secret bit of the binary image. Furthermore, empirical results on test images reveal that our approach is a robust watermarking scheme for binary images.

Chen, Wen-Yuan; Liu, Chen-Chung

2006-01-01

274

The Fisher's discriminant ratio has been used as a class separability criterion and implemented in a k-means clustering algorithm for performing simultaneous feature selection and data set trimming on a set of 221 HIV-1 protease inhibitors. The total number of molecular descriptors computed for each inhibitor is 43, and they are scaled to lie between 1 and 0 before being subjected to the feature selection process. Since the purpose is to select some of the most class sensitive descriptors, several feature evaluation indices such as the Shannon entropy, the linear regression of selected descriptors on the pKi of selected inhibitors, and a stepwise variable selection program are used to filter them. While the Shannon entropy provides the information content for each descriptor computed, more class sensitive descriptors are searched by both the linear regression and stepwise variable selection procedures. The inhibitors are divided into several different numbers of classes. They are subsequently divided into five classes due to the fact that the best feature selection result is obtained by the division. Most of the good features selected are the topological descriptors, and they are correlated well with the pKi values. The outliers or the inhibitors with less class-sensitive descriptor values computed for each selected descriptor are identified and gathered by the k-means clustering algorithm. These are the trimmed inhibitors, while the remaining ones are retained or selected. We find that 44% or 98 inhibitors can be retained when the number of good descriptors selected for clustering is three. The descriptor values of these selected inhibitors are far more class sensitive than the original ones as evidenced by substantial increasing in statistical significance when they are subjected to both the SYBYL CoMFA PLS and Cerius2 PLS regression analyses. PMID:14741013

Lin, Thy-Hou; Li, Huang-Te; Tsai, Keng-Chang

2004-01-01

275

A biologically-inspired clustering algorithm dependent on spatial data in sensor networks

Sensor networks in environmental monitoring applications aim to provide scientists with a useful spatio- temporal representation of the observed phenomena. This helps to deepen their understanding of the environmental signals that cover large geographic areas. In this paper, the spatial aspect of this data handling requirement is met by creating clusters in a sensor network based on the rate of

Ibiso Wokoma; Lan Ling Shum; Lionel Sacks; Ian Marshall

2005-01-01

276

An image segmentation approach based on maximum variance Intra-cluster method and Firefly algorithm

Segmentation is a low level operation that can segment the image in discrete and homogenous regions. Otsu's method for image segmentation selects an optimum threshold by maximizing the variance Intra-clusters in a gray level image. However, with increasing the number of classes, the total runtimes also increase exponentially. Due to the fact, that a large number of iterations are required

Tahereh Hassanzadeh; Hakimeh Vojodi; Amir Masoud Eftekhari Moghadam

2011-01-01

277

A Unified Hierarchical Algorithm for Global Illumination with Scattering Volumes and Object Clusters

This paper presents a new radiosity algorithmthat allows the simultaneous computation of energy exchangesbetween surface elements, scattering volume distributions,and groups of surfaces, or object clusters. The newtechnique is based on a hierarchical formulation of the zonalmethod, and e#ciently integrates volumes and surfaces. Inparticular no initial linking stage is needed, even for inhomogeneousvolumes, thanks to the construction of a globalspatial hierarchy.

François X. Sillion

1995-01-01

278

. Reference [1] E. Bonabeau and C. Meyer, `Swarm intelligence', Harvard Business Review, vol. 79, no. 5, pp (sociology, psychology, archaeology, education) and economics (marketing segmentation, business strategy) [1 Manufacturing Systems Seminar, 2007. [3] A. Jain, M. Murty and P. Flynn, `Data clustering: a review', ACM

Kent, University of

279

A Scalable Clustering Algorithm Based on Affinity Propagation and Normalized Cut

In this paper, a new scalable clustering method named “APANC” (Affinity Propagation And Normalized Cut) is proposed. During the APANC process, we firstly use the “Affinity Propagation” (AP) to preliminarily group the original data in order to reduce the data-scale, and then we further group the result of AP using “Normalized Cut” (NC) to get the final result. Through such

Lei Huang; Jiabin Wang; Xing He

2010-01-01

280

A fuzzy set covering-clustering algorithm for facility location problem

Mathematical models and solution algorithms which address the problem of locating facilities and allocating customers varies widely in terms of basic assumptions, mathematical complexity and computational performance. In this paper, we are concerned with a problem of locating the number of facilities among a finite number of sites such that all existing sites (customers) are covered by at least one

Rashed Sahraeian; Mohammad Sadeq Kazemi

2011-01-01

281

A Neural Network Clustering Based Algorithm for Privacy Preserving Data Mining

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed,

Stergios G. Tsiafoulis; Vasilis C. Zorkadis

2010-01-01

282

IDSX: A Cluster Based Collaborative Intrusion Detection Algorithm for Mobile Ad-Hoc Network

Ad hoc mobile networks (MANET) vary greatly depending on their application area and environment in which they work. We have presented a brief review of the state of the art security scenario for MANETs and also pinpointed some of the loopholes of the existing intrusion detection systems (IDS) before we proposed a new collaborative algorithm called IDSX. The proposed IDSX

Rituparna Chaki; Nabendu Chaki

2007-01-01

283

Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System

NASA Astrophysics Data System (ADS)

Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.

Akhavan, P.; Karimi, M.; Pahlavani, P.

2014-10-01

284

Analyzed the prescriptions for phlegm retention syndrome that built by Ma Peizhi by the association rules and clustering algorithm, the frequency of drug usage and the relationship between drugs could be get. And from that we could conclude the experiences for phlegm retention syndrome of Ma Peizhi of menghe medical genre. The results of the analysis were that 18 core combinations were dig out, such as Citri Exocarpium Rubrum-Eriobotryae Folium-Citri Reticulatae Pericarpium. And there were 9 new prescriptions were found out such as Aurantii Fructus-Citri Exocarpium Rubium-Eriobotryae Folium-Citri Reticulatae Pericarpium. The results of the analysis were proved that Ma Peizhi of Menghe Medical Genre was good at curing phlegm retention syndrome by using the traditional Chinese medicine of mild and light, such as the medicines of mild tonification, and clearing damp and promoting diuresis. PMID:25204136

Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Huang, Xiu-Qin; Yang, Bing

2014-02-01

285

NASA Astrophysics Data System (ADS)

We implement a multiorbital cluster dynamical mean-field theory (DMFT) by improving a sample update algorithm in the continuous-time quantum Monte Carlo method based on the interaction expansion. The proposed sampling scheme for the spin-flip and pair-hopping interactions in the two-orbital systems mitigates the sign problem, giving an efficient way to deal with these interactions. In particular, in the single-site DMFT, we see that the negative signs vanish. We apply the method to the two-dimensional two-orbital Hubbard model at half-filling, where we take into account the short-range spatial correlation effects within a four-site cluster. We show that, compared to the single-site DMFT results, the critical interaction value for the metal-insulator transition decreases and that the effects of the spin-flip and pair-hopping terms are less significant in the parameter region we have studied. The present method provides a firm starting point for the study of intersite correlations in multiorbital systems. It also has a wide applicable scope in terms of realistic calculations in conjunction with density functional theory.

Nomura, Yusuke; Sakai, Shiro; Arita, Ryotaro

2014-05-01

286

The interacting multiple model (IMM) algorithm has proved to be useful in tracking maneuvering targets. Tracking accuracy\\u000a can be further improved using data fusion. Tracking of multiple targets using multiple sensors and fusing them at a central\\u000a site using centralized architecture involves communication of large volumes of measurements to a common site. This results\\u000a in heavy processing requirement at the

V. Vaidehi; K. Kalavidya; S. Indira Gandhi

2004-01-01

287

K. Daqrouq, Emad Khalaf, O.Daoud, and A. Al-Qawasmi, K-means Clustering Algorithm Identification System Using Wavelet Transform, Advanced Signal MIC-CSC2009, 16-19 Dec. 2009. Abstract The Speaker identification is the process of determining which registered user provides a given utterance. In this paper

288

Molecular candidates possessing unconventional chemical bonding paradigms (e.g., boron wheels, molecular stars, and multicenter bonding) have attracted a great deal of attention by the computational community. The viability of such systems is necessarily assessed through the identification of the lowest lying energy forms of a given chemical composition on the potential energy surface (PES). Although dozens of search algorithms have been developed, only a few are general and simple enough to become standard everyday procedures for this purpose. The simple random search and genetic algorithm (GA) are among these: but how do these approaches perform on typical isomeric searches? The performance of three specific variants for the ab initio exploration of the PES of prototype planar tetracoordinated and hypercoordinated carbon-containing systems C(2) Al(4) and CB(6) (2-) are compared. The advantages of preoptimizing with a low-cost semiempirical method (e.g., PM6) together with the most cost-efficient GA-based variant are discussed, and the trends verified by the isomer search of the larger Si(5) Li(7) (+) clusters. PMID:22162002

Avaltroni, Fabrice; Corminboeuf, Clemence

2012-02-15

289

A Simple and Effective Clustering Algorithm for Multispectral Images Using Space-Filling Curves

NASA Astrophysics Data System (ADS)

With the wide usage of multispectral images, a fast efficient multidimensional clustering method becomes not only meaningful but also necessary. In general, to speed up the multidimensional images' analysis, a multidimensional feature vector should be transformed into a lower dimensional space. The Hilbert curve is a continuous one-to-one mapping from N-dimensional space to one-dimensional space, and can preserves neighborhood as much as possible. However, because the Hilbert curve is generated by a recurve division process, ‘Boundary Effects’ will happen, which means data that are close in N-dimensional space may not be close in one-dimensional Hilbert order. In this paper, a new efficient approach based on the space-filling curves is proposed for classifying multispectral satellite images. In order to remove ‘Boundary Effects’ of the Hilbert curve, multiple Hilbert curves, z curves, and the Pseudo-Hilbert curve are used jointly. The proposed method extracts category clusters from one-dimensional data without computing any distance in N-dimensional space. Furthermore, multispectral images can be analyzed hierarchically from coarse data distribution to fine data distribution in accordance with different application. The experimental results performed on LANDSAT data have demonstrated that the proposed method is efficient to manage the multispectral images and can be applied easily.

Zhang, Jian; Kamata, Sei-Ichiro

290

Al(4)(2-) was the first discovered ? + ? aromatic all-metal cluster. In the present work we analyze the molecular structure, relative stability, and aromaticity of lowest-lying isomers of related M(2)N(2)(2-) (M and N = B, Al, and Ga) clusters, with special emphasis devoted to the cis (C(2v)) and trans (D(2h)) isomers of the M(2)N(2)(2-) clusters. For such purpose, we start by performing the search of the global minimum for each cluster through the Gradient Embedded Genetic Algorithm (GEGA). Energy decomposition analyses and the calculated magnetic- and electronic-based aromaticity criteria of the lowest-lying isomers help to understand the nature of the bonding and the origin of the stability of the global minima. Such methodology should allow guiding future molecular design strategies. PMID:22990879

Islas, Rafael; Poater, Jordi; Matito, Eduard; Solà, Miquel

2012-11-21

291

Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561

Jiang, Joe-Air; Chen, Chia-Pang; Chuang, Cheng-Long; Lin, Tzu-Shiang; Tseng, Chwan-Lu; Yang, En-Cheng; Wang, Yung-Chung

2009-01-01

292

A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

NASA Astrophysics Data System (ADS)

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ?-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

293

Biological information generated by high-throughput technology has made systems approach feasible for many biological problems. By this approach, optimization of metabolic pathway has been successfully applied in the amino acid production. However, in this technique, gene modifications of metabolic control architecture as well as enzyme expression levels are coupled and result in a mixed integer nonlinear programming problem. Furthermore, the stoichiometric complexity of metabolic pathway, along with strong nonlinear behaviour of the regulatory kinetic models, directs a highly rugged contour in the whole optimization problem. There may exist local optimal solutions wherein the same level of production through different flux distributions compared with global optimum. The purpose of this work is to develop a novel stochastic optimization approach-information guided genetic algorithm (IGA) to discover the local optima with different levels of modification of the regulatory loop and production rates. The novelties of this work include the information theory, local search, and clustering analysis to discover the local optima which have physical meaning among the qualified solutions. PMID:17669537

Zheng, Ying; Yeh, Chen-Wei; Yang, Chi-Da; Jang, Shi-Shang; Chu, I-Ming

2007-08-31

294

A tutorial on spectral clustering

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved eciently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works

Ulrike Von Luxburg

2007-01-01

295

Adaptive fuzzy leader clustering of complex data sets in pattern recognition

NASA Technical Reports Server (NTRS)

A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.

Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda

1992-01-01

296

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Bio-molecular Data.

In order to further improve the performance of tumor clustering from bio-molecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from bio-molecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks, named as HFCEF-I, HFCEF-II, HFCEF-III and HFCEF-IV respectively, to identify samples which belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension, and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. PMID:23689925

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

2013-05-16

297

Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.

Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina, E-mail: despina.kontos@uphs.upenn.edu [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)] [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2013-12-15

298

An Improved FCM Medical Image Segmentation Algorithm Based on MMTD

Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation. PMID:24648852

Zhou, Ningning; Yang, Tingting; Zhang, Shaobai

2014-01-01

299

The novel surface mode of the Birmingham Cluster Genetic Algorithm (S-BCGA) is employed for the global optimisation of noble metal tetramers upon an MgO (100) substrate at the GGA-DFT level of theory. The effect of element identity and alloying in surface-bound neutral subnanometre clusters is determined by energetic comparison between all compositions of PdnAg(4-n) and PdnPt(4-n). While the binding strengths to the surface increase in the order Pt > Pd > Ag, the excess energy profiles suggest a preference for mixed clusters for both cases. The binding of CO is also modelled, showing that the adsorption site can be predicted solely by electrophilicity. Comparison to CO binding on a single metal atom shows a reversal of the 5?-d activation process for clusters, weakening the cluster-surface interaction on CO adsorption. Charge localisation determines homotop, CO binding and surface site preferences. The electronic behaviour, which is intermediate between molecular and metallic particles allows for tunable features in the subnanometre size range. PMID:25158024

Heard, Christopher J; Heiles, Sven; Vajda, Stefan; Johnston, Roy L

2014-10-21

300

NASA Astrophysics Data System (ADS)

The novel surface mode of the Birmingham Cluster Genetic Algorithm (S-BCGA) is employed for the global optimisation of noble metal tetramers upon an MgO (100) substrate at the GGA-DFT level of theory. The effect of element identity and alloying in surface-bound neutral subnanometre clusters is determined by energetic comparison between all compositions of PdnAg(4-n) and PdnPt(4-n). While the binding strengths to the surface increase in the order Pt > Pd > Ag, the excess energy profiles suggest a preference for mixed clusters for both cases. The binding of CO is also modelled, showing that the adsorption site can be predicted solely by electrophilicity. Comparison to CO binding on a single metal atom shows a reversal of the 5?-d activation process for clusters, weakening the cluster-surface interaction on CO adsorption. Charge localisation determines homotop, CO binding and surface site preferences. The electronic behaviour, which is intermediate between molecular and metallic particles allows for tunable features in the subnanometre size range.

Heard, Christopher J.; Heiles, Sven; Vajda, Stefan; Johnston, Roy L.

2014-09-01

301

IGroup: web image search results clustering

In this paper, we propose, IGroup, an efficient and effective algorithm that organizes Web image search results into clusters. IGroup is different from all existing Web image search results clustering algorithms that only cluster the top few images using visual or textual features. Our proposed algorithm first identifies several query-related semantic clusters based on a key phrases extraction algorithm originally

Feng Jing; Changhu Wang; Yuhuan Yao; Kefeng Deng; Lei Zhang; Wei-ying Ma

2006-01-01

302

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

Background Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis. PMID:17204155

Fu, Limin; Medico, Enzo

2007-01-01

303

Semantic video clustering across sources using bipartite spectral clustering

Data clustering is an important technique for visual data management. Most previous work focuses on clustering video data within single sources. In this paper, we address the problem of clu stering across sources, and propose novel spectral clustering algorithms for multi- source clustering problems. Spectral clustering is a new discriminative method realizing clustering by partitioning data graphs. We represent multi-source

Dong-Qing Zhang; Ching-yung Lin; Shih-fu Chang; John R. Smith

2004-01-01

304

[Cluster analysis in biomedical researches].

Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781

Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

2013-01-01

305

Xi Jiuyi, born in 1923, is a famous expert on peripheral vascular diseases. He has been engaged in scientific research and clinical teaching over 60 years and cured of the countless patients. He has decided the five clinical classifications of diabetic foot (DF) and got outstanding curative effects. Following the medical practice of Professor Xi and using engineering technology, we

Cao Yemin; Zhang Haowei; Xu HongTao; Xi Jiuyi; Zhu Xunsheng; Gu Zheng

2011-01-01

306

Participatory Learning in Fuzzy Clustering

This work suggests an unsupervised fuzzy clustering algorithm based on the concept of participatory learning introduced by Yager in the nineties. The performance of the algorithm is verified with synthetic data sets and with the well-known Iris data. In both circumstances the participatory learning algorithm determines the expected number of clusters and the corresponding cluster centers successfully. Comparisons with Gustafson-Kessel

L. Silva; F. Gomide; R. Yager

2005-01-01

307

Efficient streaming text clustering

Clustering data streams has been a new research topic, recently emerged from many real data mining applications, and has attracted a lot of research attention. However, there is little work on clustering high-dimensional streaming text data. This paper combines an efficient online spherical k-me ans (OSKM) algorithm with an existing scalable clustering strategy to achieve fast and adaptive clustering of

Shi Zhong

2005-01-01

308

We consider the problem of clustering two-dimensional as- sociation rules in large databases. We present a geometric- based algorithm, BitOp, for performing the clustering, em- bedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segment- ation generated by ARCS using the Minimum

Brian Lentt; Arun N. Swami; Jennifer Widom

1997-01-01

309

We study the online clustering problem where data items arrive in an online fashion. The algorithm maintains a clustering of data items into similarity classes. Upon arrival of v, the relation between v and previously arrived items is revealed, so that for each u we are told whether v is similar to u. The algorithm can create a new cluster for v and merge existing clusters. When the objective is to minimize disagreements between the clustering and the input, we prove that a natural greedy algorithm is O(n)-competitive, and this is optimal. When the objective is to maximize agreements between the clustering and the input, we prove that the greedy algorithm is .5-competitive; that no online algorithm can be better than .834-competitive; we prove that it is possible to get better than 1/2, by exhibiting a randomized algorithm with competitive ratio .5+c for a small positive fixed constant c.

Mathieu, Claire; Schudy, Warren

2010-01-01

310

Bias Field Estimation and Adaptive Segmentation of MRI Data Using a Modi ed Fuzzy C-Means Algorithm

., Louisville, KY 40292 E-mail:fmohamed,faragg@cairo.spd.louisville.edu +Department of Neurological Surgery. Experimental results on both synthetic images and MR data are given to demon- strate the e ectiveness and e. Such inhomogeneities have rendered conventional intensity-based classi cation of MR images very di cult, even

Farag, Aly A.

311

We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. © 2014 The Authors. Published by Wiley Periodicals Inc. PMID:24677621

Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav

2014-01-01

312

NASA Astrophysics Data System (ADS)

The structure of 4-methylphenol (p-cresol) and its binary water cluster have been elucidated by rotationally resolved laser-induced fluorescence spectroscopy. The electronic origins of the monomer and the cluster are split into four sub-bands by the internal rotation of the methyl group and of the hydroxy group in case of the monomer, and the water moiety in case of the cluster. From the rotational constants of the monomer the structure in the S1 state could be determined to be distorted quinoidally. The structure of the p-cresol-water cluster is determined to be trans linear, with a O-O hydrogen bond length of 290pm in the electronic ground state and of 285pm in the electronically excited state. The S1-state lifetime of p-cresol, p-cresol-d1, and the binary water cluster have been determined to be 1.6, 9.7, and 3.8ns, respectively.

Myszkiewicz, Grzegorz; Meerts, W. Leo; Ratzer, Christian; Schmitt, Michael

2005-07-01

313

Genetic algorithms are tools for searching in complex spaces and they have been used successfully in the system identification\\u000a solution that is an inverse problem. Chromatography models are represented by systems of partial differential equations with\\u000a non-linear parameters which are, in general, difficult to estimate many times. In this work a genetic algorithm is used to\\u000a solve the inverse problem

Mirtha Irizar Mesa; Orestes Llanes-Santiago; Francisco Herrera Fernández; David Curbelo Rodríguez; Antônio José da Silva Neto; Leôncio Diógenes T. Câmara

2011-01-01

314

Fuzzy technique for microcalcifications clustering in digital mammograms

Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from target dimensions and to optimize the recognition efficiency. A clustering method, based on a Fuzzy C-means (FCM), has been developed. The described method, Fuzzy C-means with Features (FCM-WF), was tested on simulated clusters of microcalcifications, implying that the location of the cluster within the breast and the exact number of microcalcifications are known. The proposed method has been also tested on a set of images from the mini-Mammographic database provided by Mammographic Image Analysis Society (MIAS) publicly available. Results The comparison between FCM-WF and standard FCM algorithms, applied on both databases, shows that the former produces better microcalcifications associations for clustering than the latter: with respect to the private and the public database we had a performance improvement of 10% and 5% with regard to the Merit Figure and a 22% and a 10% of reduction of false positives potentially identified in the images, both to the benefit of the FCM-WF. The method was also evaluated in terms of Sensitivity (93% and 82%), Accuracy (95% and 94%), FP/image (4% for both database) and Precision (62% and 65%). Conclusions Thanks to the private database and to the informations contained in it regarding every single microcalcification, we tested the developed clustering method with great accuracy. In particular we verified that 70% of the injected clusters of the private database remained unaffected if the reconstruction is performed with the FCM-WF. Testing the method on the MIAS databases allowed also to verify the segmentation properties of the algorithm, showing that 80% of pathological clusters remained unaffected. PMID:24961885

2014-01-01

315

GENETIC-ALGORITHM BASED IMAGE COMPRESSION

In this paper we analyze the image compression problem using genetic clustering algorithms based on the pixels of the image. The main problem to solve is to find an algorithm that performs this clustering efficiently. Nowadays the possibility of solving clustering problems with genetic algorithms is being studied. In this paper we make use of genetic algorithms to obtain an

G. Merlo; P. Britos

316

Segmentation of Spin-Echo MRI brain images: a comparison study of Crisp and Fuzzy algorithms

This thesis presents a scheme for segmenting Spin-Echo MRI brain images based on Fuzzy C-Mean (FCM) clustering techniques. This scheme consists of feature extraction, feature conditioning or evaluation, and thresholded FCM clustering. Feature...

Chung, Maranatha

2012-06-07

317

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the $q$-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models [Comput. Phys. Commun. 183 (2012) 1155-1161]. We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions.

Komura, Yukihiro

2014-01-01

318

Delineation of river bed-surface patches by clustering high-resolution spatial grain size data

NASA Astrophysics Data System (ADS)

The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences clustering leads to a trade-off between the quality of GSD separation between patch types and the spatial coherence of patches. Methods incorporating spatial information during the clustering process tended to produce a finite number of types of patches. As methods improve for collecting high-resolution grain size data, the approaches described here can be scaled up to field studies to better characterize the grain size heterogeneity of river beds.

Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.

2014-01-01

319

Adaptive Clustering of Hypermedia Documents.

ERIC Educational Resources Information Center

Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…

Johnson, Andrew; Fotouhi, Farshad

1996-01-01

320

HYBRID: From Atom-Clusters to Molecule-Clusters

\\u000a This paper presents a clustering algorithm named HYBRID. HYBRID has two phases: in the first phase, a set of spherical atom-clusters with same size is generated, and in the second phase these atom-clusters are merged into a set of molecule-clusters. In the first phase, an incremental clustering method is applied to generate atom-clusters according to memory resources.\\u000a In the second

Zhou Bing; Jun-yi Shen; Qin-ke Peng

2005-01-01

321

Incremental Hierarchical Clustering of Text Documents

Incremental Hierarchical Clustering of Text Documents by Nachiketa Sahoo Adviser: Jamie Callan May 5, 2006 Abstract Incremental hierarchical text document clustering algorithms are important, this is a relatively unexplored area in the text document clustering literature. Pop- ular incremental hierarchical

Gordon, Geoffrey J.

322

Fuzzy clustering of large satellite images using high performance computing

NASA Astrophysics Data System (ADS)

Fuzzy clustering is one of the most frequently used methods for identifying homogeneous regions in remote sensing images. In the case of large images, the computational costs of fuzzy clustering can be prohibitive unless high performance computing is used. Therefore, efficient parallel implementations are highly desirable. This paper presents results on the efficiency of a parallelization strategy for the Fuzzy c-Means (FCM) algorithm. In addition, the parallelization strategy has been extended in the case of two FCM variants, which incorporates spatial information (Spatial FCM and Gaussian Kernel-based FCM with spatial bias correction). The high-level requirements that guided the formulation of the proposed parallel implementations are: (i) find appropriate partitioning of large images in order to ensure a balanced load of processors; (ii) use as much as possible the collective computations; (iii) reduce the cost of communications between processors. The parallel implementations were tested through several test cases including multispectral images and images having a large number of pixels. The experiments were conducted on both a computational cluster and a BlueGene/P supercomputer with up to 1024 processors. Generally, good scalability was obtained both with respect to the number of clusters and the number of spectral bands.

Petcu, Dana; Zaharie, Daniela; Panica, Silviu; Hussein, Ashraf S.; Sayed, Ahmed; El-Shishiny, Hisham

2011-11-01

323

Deriving quantitative models for correlation clusters

Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only

Elke Achtert; Christian Böhm; Hans-peter Kriegel; Peer Kröger; Arthur Zimek

2006-01-01

324

Robust Speaker Clustering Using Affinity Propagation

NASA Astrophysics Data System (ADS)

In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.

Zhang, Xiang; Lu, Ping; Suo, Hongbin; Zhao, Qingwei; Yan, Yonghong

325

NASA Astrophysics Data System (ADS)

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620

Komura, Yukihiro; Okabe, Yutaka

2014-03-01

326

Abstract: This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.

Taeho Jo

327

Molecular Image Segmentation Based on Improved Fuzzy Clustering

Segmentation of molecular images is a difficult task due to the low signal-to-noise ratio of images. A novel two-dimensional fuzzy C-means (2DFCM) algorithm is proposed for the molecular image segmentation. The 2DFCM algorithm is composed of three stages. The first stage is the noise suppression by utilizing a method combining a Gaussian noise filter and anisotropic diffusion techniques. The second stage is the texture energy characterization using a Gabor wavelet method. The third stage is introducing spatial constraints provided by the denoising data and the textural information into the two-dimensional fuzzy clustering. The incorporation of intensity and textural information allows the 2DFCM algorithm to produce satisfactory segmentation results for images corrupted by noise (outliers) and intensity variations. The 2DFCM can achieve 0.96 ± 0.03 segmentation accuracy for synthetic images under different imaging conditions. Experimental results on a real molecular image also show the effectiveness of the proposed algorithm. PMID:18368139

Yu, Jinhua; Wang, Yuanyuan

2007-01-01

328

Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering.

This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences. PMID:21984509

Gong, Maoguo; Zhou, Zhiqiang; Ma, Jingjing

2012-04-01

329

Text document clustering based on neighbors

Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as an optimization process of grouping documents into k clusters so that a particular criterion function is minimized or maximized. Usually, the cosine function

Congnan Luo; Yanjun Li; Soon M. Chung

2009-01-01

330

A Linear Algebra Measure of Cluster Quality.

ERIC Educational Resources Information Center

Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)

Mather, Laura A.

2000-01-01

331

The Evaluation Measure of Text Clustering for the Variable Number of Clusters

This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and\\u000a Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in\\u000a using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result\\u000a of text clustering using

Taeho Jo; Malrey Lee

2007-01-01

332

NASA Astrophysics Data System (ADS)

Terrestrial Laser Scanners (TLS) are used frequently in three dimensional documentation studies and present an alternative method for three dimensional modeling without any deformation of scale. In this study, point cloud data segmentation is used for photogrammetrical image data production from laser scanner data. The segmentation studies suggest several methods for automation of curve surface determination for digital terrain modeling. In this study, fuzzy logic approach has been used for the automatic segmentation of the regular curve surfaces which differ in their depths to the instrument. This type of shapes has been usually observed in the dome surfaces for close range architectural documentation. The model of C-means integrated fuzzy logic approach has been developed with MatLAB 7.0 software. Gauss2mf membership functions algorithm has been tested with original data set. These results were used in photogrammetric 3D modeling process. As the result of the study, testing the results of point cloud data set has been discussed and interpreted with all of its advantages and disadvantages in Section 5.

Ergun, Bahadir; Sahin, Cumhur; Ustuntas, Taner

2014-01-01

333

NASA Astrophysics Data System (ADS)

Galaxy clusters are one of the four key cosmic acceleration probes used by the Dark Energy Survey (DES) to measure cosmological parameters with unprecedented precision. DES has recently completed commissioning of its instrument and accomplished a successful science verification data taking phase. The survey proper started in Aug 31, 2013. In this talk, I review the motivation for using clusters of galaxies in cosmology, discuss the DES expected performance and present the prospects to improve our understanding of dark energy by constraining cosmological models using galaxy clusters found by the Voronoi Tesselation galaxy cluster finding algorithm. We show results of our galaxy cluster analysis based on the early DES data sets.

Soares-Santos, Marcelle; DES Collaboration

2014-01-01

334

Clustering by evidence accumulation on affinity propagation

Affinity propagation (AP) is a clustering algorithm which has much better performance than traditional clustering approach such as k-means algorithm. In this paper, we present an algorithm called voting partition affinity propagation (voting-PAP) which is a method for clustering using evidence accumulation based on AP. Resulting clusters by voting-PAP are not constrained to be hyper-spherically shaped. Voting-PAP consists of three

Xuqing Zhang; Fei Wu; Yueting Zhuang

2008-01-01

335

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K -means clustering or fuzzy C -means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018

Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

2014-12-01

336

Relation chain based clustering analysis

NASA Astrophysics Data System (ADS)

Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In this paper, a new method called relation chain based clustering is presented. The given method demonstrates that arbitrary distribution shape and density are not the essential factors for clustering research, in another words, clusters described by some particular expressions should be considered as a uniform mathematical description which is called "relation chain" emphasized in this paper. The relation chain indicates the relation between each pair of the spatial points and gives the evaluation of the connection between the pair-wise points. This relation chain based clustering algorithm initially assigns the neighborhood evaluation radius of the points, then assesses the clustering result based on inner-cluster variance of each cluster while increasing the radius, adjusting the radius properly and finally gives the clustering result. Some experiments are conducted using the proposed method and the hidden data structure is well explored.

Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

2011-08-01

337

Automatic Clustering with Single Optimal Solution

Determining optimal number of clusters in a dataset is a challenging task. Though some methods are available, there is no algorithm that produces unique clustering solution. The paper proposes an Automatic Merging for Single Optimal Solution (AMSOS) which aims to generate unique and nearly optimal clusters for the given datasets automatically. The AMSOS is iteratively merges the closest clusters automatically by validating with cluster validity measure to find single and nearly optimal clusters for the given data set. Experiments on both synthetic and real data have proved that the proposed algorithm finds single and nearly optimal clustering structure in terms of number of clusters, compactness and separation.

Pavan, K Karteeka; Rao, A V Dattatreya

2012-01-01

338

A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.

This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method. PMID:25248211

Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip

2014-11-01

339

Ensemble-approaches for clustering health status of oil sand pumps F. Di Maio a

and compare two unsupervised clustering ensemble methods, i.e., fuzzy C-means and hierarchical trees classifier. Keywords: Degradation, Fault detection, Fuzzy C-means, Hierarchical Tree, Ensembles expected to operate with high levels of reliability, availability, and safety although they run in adverse

Paris-Sud XI, UniversitÃ© de

340

Metamodel-based global optimization using fuzzy clustering for design space reduction

NASA Astrophysics Data System (ADS)

High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness and capability of global convergence and gives a new optimization strategy for engineering design optimization problems involving expensive black box models.

Li, Yulin; Liu, Li; Long, Teng; Dong, Weili

2013-09-01

341

Improving the Accuracy of Ontology Alignment through Ensemble Fuzzy Clustering

Improving the Accuracy of Ontology Alignment through Ensemble Fuzzy Clustering Nafisa Afrin not be helpful enough for indicating the degree of reliability. While using a random alignment tool we noticed that aggregates multiple alignment tools with the help of Fuzzy C Means clustering and Type 2 Fuzzy Membership

Dou, Dejing

342

Hierarchical bayesian clustering for automatic text classification

Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effective-Dess of text retrieval/categorization In this paper we propose a hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can re-construct the original clusters more accurately than do other non probabilistic algorithms (2) When

Makoto Iwayama

1995-01-01

343

A Modified Clustering Method with Fuzzy Ants

NASA Astrophysics Data System (ADS)

Ant-based clustering due to its flexibility, stigmergic and self-organization has been applied in variety areas from problems arising in commerce, to circuit design, and to text-mining, etc. A modified clustering method with fuzzy ants has been presented in this paper. Firstly, fuzzy ants and its behavior are defined; secondly, the new clustering algorithm has been constructed based on fuzzy ants. In this algorithm, we consider multiple ants based on Schockaert's algorithm. This algorithm can be accelerated by the use of parallel ants, global memory banks and density-based `look ahead' method. Experimental results show that this algorithm is more efficient to other ant clustering methods.

Chen, Jianbin; Fang, Deying; Xue, Yun

344

NASA Astrophysics Data System (ADS)

Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7]4-. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.

Weigend, Florian

2014-10-01

345

Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm. PMID:25296780

Weigend, Florian

2014-10-01

346

Dynamic Decentralized AnyTime Hierarchical Clustering

Hierarchical clustering is used widely to organize data and search for patterns. Previous algorithms assume that the body of data being clustered is fixed while the algorithm runs, and use centralized data representations that make it difficult to scale the process by distributing it across multiple processors. Self-Organizing Data and Search (SODAS), inspired by the decentralized algorithms that ants use

H. Van Dyke Parunak; Richard Rohwer; Theodore C. Belding; Sven Brueckner

2006-01-01

347

Spam Detection Using Text Clustering

We propose a new spam detection technique using the text clustering based on vector space model. Our method computes disjoint clusters automatically using a spherical k-means algorithm for all spam\\/non-spam mails and obtains centroid vectors of the clusters for extracting the cluster description. For each centroid vectors, the label (`spam' or `non-spam') is assigned by calculating the number of spam

Minoru Sasaki; Hiroyuki Shinnou

2005-01-01

348

Bayesian Decision Theoretical Framework for Clustering

ERIC Educational Resources Information Center

In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…

Chen, Mo

2011-01-01

349

Similarity Measures for Text Document Clustering

Clustering is a useful technique that organizes a large quan- tity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a ba- sis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierar- chical clustering schemes for processing large datasets. A

Anna Huang

2008-01-01

350

Text clustering with extended user feedback

Text clustering is most commonly treated as a fully auto- mated task without user feedback. However, a variety of re- searchers have explored mixed-initiative clustering methods which allow a user to interact with and advise the clustering algorithm. This mixed-initiative approach is especially at- tractive for text clustering tasks where the user is trying to organize a corpus of documents

Yifen Huang; Tom M. Mitchell

2006-01-01

351

Distributed Clustering Using Collective Principal Component Analysis

This paper considers distributed clustering of high dimensional heterogeneous data using a distributed Principal Component Analysis (PCA) technique called the Collective PCA. It presents the Collective PCA technique which can be used independent of the clustering application. It shows a way to integrate the Collective PCA with a given off-the-shelf clustering algorithm in order to develop a distributed clustering technique.

Hillol Kargupta; Weiyun Huang; Krishnamoorthy Sivakumar; Erik L. Johnson

2001-01-01

352

Text document clustering based on frequent word meaning sequences

Most of existing text clustering algorithms use the vector space model, which treats documents as bags of words. Thus, word sequences in the documents are ignored, while the meaning of natural languages strongly depends on them. In this paper, we propose two new text clustering algorithms, named Clustering based on Frequent Word Sequences (CFWS) and Clustering based on Frequent Word

Yanjun Li; Soon M. Chung; John D. Holt

2008-01-01

353

Mobility-Based Clustering in VANETs Using Affinity Propagation

The recent research in cluster-based MAC and rout- ing schemes for Vehicle Ad Hoc Networks (VANETs) motivates the necessity for a stable VANET clustering algorithm. Due to the highly mobile nature of VANETs, mobility must play an integral role in cluster formation. We present a novel, mobility-based clustering scheme for Vehicle Ad hoc Networks, which utilizes the Affinity Propagation algorithm

Christine Shea; Behnam Hassanabadi; Shahrokh Valaee

2009-01-01

354

Clustering Unstructured Text Documents Using Fading Function

Abstract—Clustering unstructured text documents is an important issue in data mining community and has a number of applications such as document archive filtering, document organization and topic detection and subject tracing. In the real world, some of the already clustered documents may not be of importance while new documents of more significance may evolve. Most of the work done so far in clustering unstructured text documents overlooks this aspect of clustering. This paper, addresses this issue by using the Fading Function. The unstructured text documents are clustered. And for each cluster a statistics structure called Cluster Profile (CP) is implemented. The cluster profile incorporates the Fading Function. This Fading Function keeps an account of the time-dependent importance of the cluster. The work proposes a novel algorithm Clustering n-ary Merge Algorithm (CnMA) for unstructured text documents, that uses Cluster Profile and Fading Function. Experimental results illustrating the effectiveness of the proposed technique are also included.

Pallav Roxy; Durga Toshniwal

355

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721

Slonim, Noam; Atwal, Gurinder Singh; Tkacik, Gasper; Bialek, William

2005-01-01

356

Time series clustering analysis of health-promoting behavior

NASA Astrophysics Data System (ADS)

Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

2013-10-01

357

Co-Clustering with Generative Models

In this paper, we present a generative model for co-clustering and develop algorithms based on the mean field approximation for the corresponding modeling problem. These algorithms can be viewed as generalizations of the ...

Golland, Polina

2009-11-03

358

Static and dynamic information organization with star clusters

In this paper we present a system for static and dy- namic information organization and show our evaluations of this system on TREC data. We introduce the off-line and on-line star clustering algorithms for information or- ganization. Our evaluation experiments show that the off- line star algorithm outperforms the single link and average link clustering algorithms. Since the star algorithm

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1998-01-01

359

January 11, 2002. This work was supported in part by the Norton Health Care System under Grant 97 USA (e-mail: farag@cairo.spd.louisville.edu). T. Moriarty is with the Department of Neurological-0062(02)04088-0. with different MRI acquisition parameters from patient to pa- tient and from slice to slice. Therefore

Louisville, University of

360

Automated palynological analysis has been previously proposed but proved difficult to achieve. Here, the first instance of dinoflagellate cyst (dinocyst) image clustering from palynological samples based on morphological and textural image analysis (IA) features is presented. Dinocyst-dominated example images (including acritarchs and other algae) were acquired from Cretaceous, Paleogene and Holocene samples. IA features that cover a broad array of

Andrew F. Weller; Anthony J. Harris; J. Andrew Ware

2006-01-01

361

NASA Astrophysics Data System (ADS)

Infrared thermography has been used increasingly as an effective non-destructive technique to detect cracks on metal surface. Due to many factors, infrared thermal image has low definition compared to visible image. The contrasts between cracks and sound areas in different thermal image frames of a specimen vary greatly with the recorded time. An accurate detection can only be obtained by glancing over the whole thermal video, which is a laborious work. Moreover, experience of the operator has a great important influence on the accuracy of detection result. In this paper, an infrared thermal image processing framework based on superpixel algorithm is proposed to accomplish crack detection automatically. Two popular superpixel algorithms are compared and one of them is selected to generate superpixels in this application. Combined features of superpixels were selected from both the raw gray level image and the high-pass filtered image. Fuzzy c-means clustering is used to cluster superpixels in order to segment infrared thermal image. Experimental results show that the proposed framework can recognize cracks on metal surface through infrared thermal image automatically.

Xu, Changhang; Xie, Jing; Chen, Guoming; Huang, Weiping

2014-11-01

362

Join-Graph Propagation Algorithms

The paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearl's belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes. PMID:20740057

Mateescu, Robert; Kask, Kalev; Gogate, Vibhav; Dechter, Rina

2010-01-01

363

Image Segmentation Using Spectral Clustering

This paper focuses on how to automatically determine the suitable clustering number in image segmentation and designs an algorithm of CANA using spectral clustering. Experiment results indicate that ACNA can provide superior performance. An application sample in image punching is introduced

Chong-jun Wang; Wu-jun Li; Lin Ding; Juan Tian; Shifu Chen

2005-01-01

364

Genetic algorithm with affinity propagation

Classical genetic algorithm suffers heavy pressure of fitness evaluation for time-consuming optimization problems, e.g., aerodynamic design optimization, qualitative model learning in bioinformatics. To address this problem, we present a combination between genetic algorithms and clustering methods. Specifically, the clustering method used in this paper is affinity propagation. The numerical experiments demonstrate that the proposed method performs promisingly for well-known benchmark

Chunguo Wu; Hao Gao; Lianjiang Yu; Yanchun Liang; Rongwu Xiang

2010-01-01

365

Clustering Web Search Results Using Fuzzy Ants

Clustering Web Search Results Using Fuzzy Ants Steven Schockaert,* Martine De Cock, Chris Cornelis and Uncertainty Modelling Research Unit, Krijgslaan 281 (S9), B-9000 Gent, Belgium Algorithms for clustering Web existing approaches and illustrates how our algorithm can be applied to the problem of Web search results

Gent, Universiteit

366

WordNet-based Text Document Clustering

Text document clustering can greatly simplify browsing large collections of documents by re- organizing them into a smaller number of man- ageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include am- biguity and synonymy, the former allowing for erroneous groupings and the latter causing sim- ilarities

Julian Sedding; Dimitar Kazakov

2004-01-01

367

Detecting Promising Areas by Evolutionary Clustering Search

A challenge in hybrid evolutionary algorithms is to deÞne ecient strategies to cover all search space, applying local search only in actually promising search areas. This paper proposes a way of detecting promising search areas based on clustering. In this approach, an iterative clustering works simultaneously to an evolutionary algorithm account- ing the activity (selections or updatings) in search areas

Alexandre César Muniz De Oliveira; Luiz Antonio Nogueira Lorena

2004-01-01

368

Hierarchical Clustering Hierarchical Clustering

3 C5 C4 C2 C3 C4 C5 Distance/Proximity Matrix #12;Intermediate State Â· Merge the two closest clusters (C2 and C5) and update the distance matrix. C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 ? ? ? ? ? ? ? C2 U C5C1 C1 C3 C4 C2 U C5 C3 C4 #12;Distance between two clusters Â· Each cluster is a set of points

Terzi, Evimaria

369

In this paper, we propose a novel hybrid genetic algorithm (GA) that finds a globally optimal partition of a given data into a specified number of clusters. GA's used earlier in clustering employ either an expensive crossover operator to generate valid child chromosomes from parent chromosomes or a costly fitness function or both. To circumvent these expensive operations, we hybridize

K. Krishna; M. Narasimha Murty

1999-01-01

370

Background Potentially inappropriate prescribing in older people is common in primary care and can result in increased morbidity, adverse drug events, hospitalizations and mortality. In Ireland, 36% of those aged 70 years or over received at least one potentially inappropriate medication, with an associated expenditure of over €45 million. The main objective of this study is to determine the effectiveness and acceptability of a complex, multifaceted intervention in reducing the level of potentially inappropriate prescribing in primary care. Methods/design This study is a pragmatic cluster randomized controlled trial, conducted in primary care (OPTI-SCRIPT trial), involving 22 practices (clusters) and 220 patients. Practices will be allocated to intervention or control arms using minimization, with intervention participants receiving a complex multifaceted intervention incorporating academic detailing, medicines review with web-based pharmaceutical treatment algorithms that provide recommended alternative treatment options, and tailored patient information leaflets. Control practices will deliver usual care and receive simple patient-level feedback on potentially inappropriate prescribing. Routinely collected national prescribing data will also be analyzed for nonparticipating practices, acting as a contemporary national control. The primary outcomes are the proportion of participant patients with potentially inappropriate prescribing and the mean number of potentially inappropriate prescriptions per patient. In addition, economic and qualitative evaluations will be conducted. Discussion This study will establish the effectiveness of a multifaceted intervention in reducing potentially inappropriate prescribing in older people in Irish primary care that is generalizable to countries with similar prescribing challenges. Trial registration Current controlled trials ISRCTN41694007 PMID:23497575

2013-01-01

371

A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7

We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

Hao, Jiangang; /Fermilab; McKay, Timothy A.; /Michigan U.; Koester, Benjamin P.; /Chicago U.; Rykoff, Eli S.; /UC, Santa Barbara /LBL, Berkeley; Rozo, Eduardo; /Chicago U.; Annis, James; /Fermilab; Wechsler, Risa H.; /SLAC; Evrard, August; /Michigan U.; Siegel, Seth R.; /Michigan U.; Becker, Matthew; /Chicago U.; Busha, Michael; /SLAC; Gerdes, David; /Michigan U.; Johnston, David E.; /Fermilab; Sheldon, Erin; /Brookhaven

2011-08-22

372

Optimal search space for clustering gene expression data via consensus.

Ensemble clustering methods have become increasingly important to ease the task of choosing the most appropriate cluster algorithm for a particular data analysis problem. The consensus clustering (CC) algorithm is a recognized ensemble clustering method that uses an artificial intelligence technique to optimize a fitness function. We formally prove the existence of a subspace of the search space for CC, which contains all solutions of maximal fitness and suggests two greedy algorithms to search this subspace. We evaluate the algorithms on two gene expression data sets and one synthetic data set, and compare the result with the results of other ensemble clustering approaches. PMID:18052773

Hirsch, Michael; Swift, Stephen; Liu, Xiohui

2007-12-01

373

Approximate clustering via the mountain method

We develop a simple and effective approach for approximate estimation of the cluster centers on the basis of the concept of a mountain function. We call the procedure the mountain method. It can be useful for obtaining the initial values of the clusters that are required by more complex cluster algorithms. It also can be used as a stand alone

R. R. Yager; D. P. Filev

1994-01-01

374

Frequent Term-Based Text Clustering

Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-theart text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

Florian Beil; Martin Ester

2002-01-01

375

A Spectroscopy of Texts for Effective Clustering

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of difierent data characteristics and analysis con- texts, it is often di-cult for the user to estimate the number of clusters in the data set.

Wenyuan Li; Wee-Keong Ng; Kok-leong Ong; Ee-peng Lim

2004-01-01

376

Astrophysical parameters of Galactic open clusters

We present a catalogue of astrophysical data for 520 Galactic open clusters. These are the clusters for which at least three most probable members (18 on average) could be identified in the ASCC-2.5, a catalogue of stars based on the Tycho-2 observations from the Hipparcos mission. We applied homogeneous methods and algorithms to determine angular sizes of cluster cores and

N. V. Kharchenko; A. E. Piskunov; S. Röser; E. Schilbach; R.-D. Scholz

2005-01-01

377

Characterization of Linkage-Based Clustering

Characterization of Linkage-Based Clustering Margareta Ackerman Joint work with Shai Ben of properties that characterize single linkage clustering (UAI, 2009) Previous work M. Ackerman, S. Ben-David, and D. Loker #12;Characterize linkage-based clustering algorithms, using a set of intuitive properties

Ackerman, Margareta

378

A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Abstract — This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Liping Jing; Michael K. Ng; Xinhua Yang; Joshua Zhexue Huang

379

Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method. Development of these clustering algorithms was based on examples from two-dimensional space because we wanted to copy the human perception of

CHARLES T. ZAHN

1971-01-01

380

Distributed and Incremental Clustering Based on Weighted Affinity Propagation

A new clustering algorithm Affinity Propagation (AP) is hindered by its quadratic complexity. The Weighted Affinity Propagation (WAP) proposed in this paper is used to eliminate this limitation, support two scalable algorithms. Dis- tributed AP clustering handles large datasets by merging the exemplars learned from subsets. Incremental AP extends AP to online clustering of data streams. The paper validates all

Xiangliang Zhang; Cyril Furtlehner; Michèle Sebag

2008-01-01

381

Incremental Hierarchical Clustering of Text Documents

Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, have not been applied to text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation for the same. This includes changes to the underlying distributional assumption of the algorithm in order to conform with the empirical data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset, and the gain from using a more appropriate distribution is demonstrated. 1

Nachiketa Sahoo; Adviser Jamie Callan

382

Toward Parallel Document Clustering

A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

Mogill, Jace A.; Haglin, David J.

2011-09-01

383

Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.

Muetterties, Earl L.

1980-05-01

384

Local feature selection in text clustering

Abstract. Feature selection has improved the performance of text clustering. Global feature selection tries to identify a single subset of features which are relevant to all clusters. However, the clustering process might be improved by considering different subsets of features for locally describing each cluster. In this work, we introduce the method ZOOM-IN to perform local feature selection for partitional hierarchical clustering of text collections. The proposed method explores the diversity of clusters generated by the hierarchical algorithm, selecting a variable number of features according to the size of the clusters. Experiments were conducted on Reuters collection, by evaluating the bisecting K-means algorithm with both global and local approaches to feature selection. The results of the experiments showed an improvement in clustering performance with the use of the proposed local method. 1

Marcelo N. Ribeiro; Manoel J. R. Neto; Ricardo B. C. Prudêncio

385

Initializing K-means Clustering Using Affinity Propagation

K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel clustering algorithm called affinity propagation (AP clustering) has been developed by Frey and Dueck, which can produce a good set of cluster exemplars with

Yan Zhu; Jian Yu; Caiyan Jia

2009-01-01

386

Search the Optimal Preference of Affinity Propagation Algorithm

In order to improve the clustering quality of the Affinity Propagation algorithm further and get more accurate number of clusters, this paper proposed a novel algorithm based on the Particles Swarm Optimization, which used In-Group Proportion index as fitness function to search the optimal preference of Affinity Propagation algorithm. Experimental results show that the predicted results had been tested with

Yi Zhong; Ming Zheng; Jianan Wu; Wei Shen; You Zhou; Chunguang Zhou

2012-01-01

387

When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952

Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond

2014-01-01

388

Misty Mountain clustering: application to fast unsupervised flow cytometry gating

BACKGROUND: There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires

István P. Sugár; Stuart C. Sealfon

2010-01-01

389

Dynamically Adaptive Data Clustering Using Intelligent Swarm-like Agents

Inspired by the self-organized behaviour of bird flocks, a new dynamic clustering approach based on Particle Swarm Optimization is proposed. This paper introduces a novel clustering method, the PSDC, a new Particle Swarm-like agents approach for Dynamically Adaptive data clustering. Unlike other partition clustering algorithms, this technique does not require initial partitioned seeds and it can dynamically adapt to the

Sherin M. Youssef; Mohamed Rizk; Mohamed El-Sherif

2007-01-01

390

Domain Based Punjabi Text Document Clustering ABSTRACT

Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text clustering treats document as conglomeration of words. The syntactic or semantic relations between words are not given any consideration. Many different algorithms were propagated to study and find connection among different words in a sentence by using different concepts. In this paper, a hybrid algorithm for clustering of Punjabi text document that uses semantic relations among words in a sentence for extracting phrases has been developed. Phrases extracted create a feature vector of the document which is used for finding similarity among all documents. Experimental results reveal that hybrid algorithm performs better with real time data sets.

Saurabh Sharma; Vishal Gupta

391

UNSUPERVISED VALIDITY MEASURES FOR VOCALIZATION CLUSTERING

This paper describes unsupervised speech/speaker cluster validity measures based on a dissimilarity metric, for the purpose of estimating the number of clusters in a speech data set as well as assessing the consistency of the clustering procedure. The number of clusters is estimated by minimizing the cross-data dissimilarity values, while algorithm consistency is evaluated by calculating the dissimilarity values across multiple experimental runs. The method is demonstrated on the task of Beluga whale vocalization clustering. Index Terms — speech/speaker clustering, unsupervised validity, dissimilarity value, validation of classifiers. 1.

Kuntoro Adi; Kristine E. Sonstrom; Peter M. Scheifele; Michael T. Johnson

392

Functional classification aims at grouping genes according to their molecular function or the biological process they participate in. Evaluating the validity of such unsupervised gene classification remains a challenge given the variety of distance measures and classification algorithms that can be used. We evaluate here functional classification of genes with the help of reference sets: KEGG (Kyoto Encyclopaedia of Genes and Genomes) pathways and Pfam clans. These sets represent ground truth for any distance based on GO (Gene Ontology) biological process and molecular function annotations respectively. Overlaps between clusters and reference sets are estimated by the F-score method. We test our previously described IntelliGO semantic distance with hierarchical and fuzzy C-means clustering and we compare results with the state-of-the-art DAVID (Database for Annotation Visualisation and Integrated Discovery) functional classification method. Finally, study of best matching clusters to reference sets leads us to propose a set-difference method for discovering missing information. PMID:23013652

Devignes, Marie-Dominique; Benabderrahmane, Sidahmed; Smaïl-Tabbone, Malika; Napoli, Amedeo; Poch, Olivier

2012-01-01

393

When Network Coding improves the Performances of Clustered Wireless Networks

When Network Coding improves the Performances of Clustered Wireless Networks, Roussel}@univ-mlv.fr Abstract This paper introduces a network coding scheme that significantly increases the performances of clustering algorithms in wireless multi-hop networks

Paris-Sud XI, UniversitÃ© de

394

Fuzzy Clustering in the Analysis of Fourier Transform Infrared Spectra

....................................................................................................93 5.2 VFC-SA Clustering Algorithm...........................................................................98 5.4 Evaluation of VFC-SA and SAFC Clustering of Oral Cancer Cells...........103 5.5 Summary

Aickelin, Uwe

395

An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg@informatik.uniÂhalle.de Abstract Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering

Hinneburg, Alexander

396

Automatically determining the number of Affinity Propagation Clustering using Particle Swarm

Affinity propagation (AP) is a new powerful clustering algorithm based message passing between data points. One of the major problems in clustering is the determination of the optimal number of clusters. In this paper, we propose a new approach called Affinity Propagation Clustering based Particle Swarm Optimization (PSO-AP), which using Particle Swarm Optimization (PSO) algorithm to determination of the optimal

Xian-hui Wang; Xuan-ping Zhang; Chun-xiao Zhuang; Zu-ning Chen; Zheng Qin

2010-01-01

397

A cluster validation index for GK cluster analysis based on relative degree of sharing

In this paper, the problem of traditional validity indices when applied to the Gus- tafson-Kessel (GK) clustering are reviewed. A new cluster validity index for the GK algorithm is proposed. This validity index is defined as the average value of the relative degrees of sharing of all possible pairs of fuzzy clusters in the system. It computes the overlap of

Young-il Kim; Dae-won Kim; Doheon Lee; Kwang Hyung Lee

2004-01-01

398

Conceptual Clustering of Text Clusters

Common clustering techniques have the disadvantage that they do not provide intensional descriptions of the clusters obtained. Conceptual Clustering techniques, on the other hand, provide such descriptions, but are known to be rather slow. In this paper, we discuss a way of combining both techniques. We first cluster the documents by a variant of -Means, using a thesaurus as background

Andreas Hotho; Gerd Stumme

2002-01-01

399

Location-Aware Affinity Propagation Clustering in Wireless Sensor Networks

Clustering has become a crucial operation in wireless sensor networks (WSNs). Affinity propagation (AP) is a relatively new clustering technique that has been shown to possess several advantages over long-standing algorithms such as K-means, particularly in terms of quality of clustering and multi-criteria support. However, the original AP algorithm is computationally intensive making it unsuitable for clustering in WSNs. A

Mahmoud Elgammal; Mohamed Eltoweissy

2009-01-01

400

Clustering II Hierarchical Clustering

clusters C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Distance/Proximity Matrix ... p1 p2 p3 p4 p9 p10 p11 p. C1 C4 C2 C5 C3 C2C1 C1 C3 C5 C4 C2 C3 C4 C5 Distance/Proximity Matrix ... p1 p2 p3 p4 p9 p10 p11 p121 C3 C4 C2 U C5 C3 C4 ... p1 p2 p3 p4 p9 p10 p11 p12 #12;Distance between two clusters Â· Each

Terzi, Evimaria

401

The last decade has witnessed a tremendous growth in the area of randomized algorithms.During this period, randomized algorithms went from being a tool in computational number theory to finding widespread application in many types of algorithms. Two benefits of randomization have spearheaded this growth: simplicity and speed. For many applications, a randomized algorithm is the simplest algorithm available, or the

Rajeev Motwani; Prabhakax Raghavan

1995-01-01

402

Realcoded Genetic Optimization of Fuzzy Clustering Konstantinos Blekas and Andreas Stafylopatis

) constitute fundamental aspects of the clustering problem. A broad spectrum of clustering algorithms attempt to the typical iterative numerical process used by the algorithm to search for an optimal partitioning, genetic algorithms have recently been applied to this and related problems [4, 5, 6, 15]. Genetic algorithms [9, 13

Blekas, Konstantinos

403

A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as

Daryl J. Eigen; Frederick R. Fromm; Richard A. Northouse

1974-01-01

404

An automatic method for estimating the content of intramuscular fat (IMF) in beef M. longissimus dorsi (LD) was developed using a sequence of image processing algorithm. To extract IMF particles within the LD muscle from structural features of intermuscular fat surrounding the muscle, three steps of image processing algorithm were developed, i.e. bilateral filter for noise removal, kernel fuzzy c-means clustering (KFCM) for segmentation, and vector confidence connected and flood fill for IMF extraction. The technique of bilateral filtering was firstly applied to reduce the noise and enhance the contrast of the beef image. KFCM was then used to segment the filtered beef image into lean, fat, and background. The IMF was finally extracted from the original beef image by using the techniques of vector confidence connected and flood filling. The performance of the algorithm developed was verified by correlation analysis between the IMF characteristics and the percentage of chemically extractable IMF content (P<0.05). Five IMF features are very significantly correlated with the fat content (P<0.001), including count densities of middle (CDMiddle) and large (CDLarge) fat particles, area densities of middle and large fat particles, and total fat area per unit LD area. The highest coefficient is 0.852 for CDLarge. PMID:22063863

Du, Cheng-Jin; Sun, Da-Wen; Jackman, Patrick; Allen, Paul

2008-12-01

405

NASA Astrophysics Data System (ADS)

In typical case 2 waters an accurate remote sensing retrieval of chlorophyll a (chla) is still challenging. There is a widespread understanding that universally applicable water constituent retrieval algorithms are currently not feasible, shifting the research focus to regionally specific implementations of powerful inversion methods. This study takes advantage of regionally specific chlorophyll a (chla) algorithms, which were developed by the authors of this abstract in previous works, and the characteristics of Medium Resolution Imaging Spectrometer (MERIS) in order to study harmful algal events in the optically complex waters of the Galician Rias (NW). Harmful algal events are a frequent phenomenon in this area with direct and indirect impacts to the mussel production that constitute a very important economic activity for the local community. More than 240 106 kg of mussel per year are produced in these highly primary productive upwelling systems. A MERIS archive from nine years (2003-2012) was analysed using regionally specific chla algorithms. The latter were developed based on Multilayer perceptron (MLP) artificial neural networks and fuzzy c-mean clustering techniques (FCM). FCM specifies zones (based on water leaving reflectances) where the retrieval algorithms normally provide more reliable results. Monthly chla anomalies and other statistics were calculated for the nine years MERIS archive. These results were then related to upwelling indices and other associated measurements to determine the driver forces for specific phytoplankton blooms. The distribution and changes of chla are also discussed.

Gonzalez Vilas, L.; Castro Fernandez, M.; Spyrakos, E.; Torres Palenzuela, J.

2013-08-01

406

Bayesian K-Means as a "Maximization-Expectation" Algorithm Max Welling

agglomerative clustering algorithm. In experiments we compare this algorithm against a number of alternative of the workhorses of ma- chine learning. Faced with the exponential growth of data, researchers have recently complex models. We also de- rive an alternative agglomerative clustering algorithm. Both algorithms can

Welling, Max

407

\\u000a The right choice of an optimization algorithm can be crucially important in finding the right solutions for a given optimization\\u000a problem. There exist a diverse range of algorithms for optimization, including gradient-based algorithms, derivative-free\\u000a algorithms and metaheuristics. Modern metaheuristic algorithms are often nature-inspired, and they are suitable for global\\u000a optimization. In this chapter, we will briefly introduce optimization algorithms such

Xin-She Yang

408

Cluster identification via Voronoi tessellation

We propose an automated method for detecting galaxy clusters in imaging surveys based on the Voronoi tessellation technique. It appears very promising, expecially for its capability of detecting clusters indipendently from their shape. After a brief explanation of our use of the algorithm, we show here an example of application based on a strip of the ESP Key Programme complemented with galaxies of the COSMOS/UKST Southern Sky Catalogue supplied by the Anglo- Australian Observatory.

M. Ramella; M. Nonino; W. Boschin; D. Fadda

1998-10-08

409

Clustered-minority-pixel error diffusion.

We present a clustered-minority-pixel error-diffusion halftoning algorithm for which the quantizer threshold is modified on the basis of the past output and a dot activation map. Dot area, dot shape, and dot distribution are more controllable than with other clustered-dot halftone algorithms such as Levien's algorithm. This method also effectively reduces structured mazelike artifacts in midtones that occur in Levien's algorithm. The dot distribution is further improved by using different error-diffusion weights for different input gray levels. PMID:15260246

Li, Pingshan; Allebach, Jan P

2004-07-01

410

A Scalable Framework For Cluster Ensembles *

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. PMID:20160846

Hore, Prodip; Hall, Lawrence O.; Goldgof, Dmitry B.

2009-01-01

411

Computational Intelligence Algorithms and DNA Microarrays

In this chapter, we present Computational Intelligence algorithms, such as Neural Network algorithms, Evolutionary Algorithms,\\u000a and clustering algorithms and their application to DNA microarray experimental data analysis. Additionally, dimension reduction\\u000a techniques are evaluated. Our aim is to study and compare various Computational Intelligence approaches and demonstrate their\\u000a applicability as well as their weaknesses and shortcomings to efficient DNA microarray data

Dimitris K. Tasoulis; Vassilis P. Plagianakos; Michael N. Vrahatis

2008-01-01

412

Semi-supervised consensus clustering for gene expression data analysis

Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge. Methods We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation. Results Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper. PMID:24920961

2014-01-01

413

Feature Clustering for Accelerating Parallel Coordinate Descent

We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

2012-12-06

414

NSDL National Science Digital Library

The May featured molecules are discussed in the Viewpoints article "Boron Clusters Come of Age". The review paper by Russell N. Grimes on boron clusters reminds us both of the past impact that these interesting structures have had on the development of our understanding of cluster chemistry and on the future development of what one might refer to as "post-fullerene" clusters. The wide range of structures found in this paper admirably illustrate the structural flexibility arising from clusters of a variety of symmetries and degrees of boron replacement with carbon and other atoms.

415

An RSU-assisted Cluster Head Selection and Backup Protocol

In VANET, vehicles moving on the highway usually form clusters, and the wireless communications between vehicles are generally conducted by cluster heads. In most cluster-based protocols, however, the cluster head selection carried out through inter-cluster communications has some limitations, e.g., large communication and computation cost, quite complex algorithms, requirement of additional devices and so on. Another potential problem is that

Lin Sun; Ying Wu; Jingdong Xu; Yuwei Xu

2012-01-01

416

NASA Technical Reports Server (NTRS)

Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

Wang, Lui; Bayer, Steven E.

1991-01-01

417

Using a Lagrangian-based approach, we present a more elegant derivation of the equations necessary for the variational optimization of the molecular orbitals (MOs) for the coupled-cluster doubles (CCD) method and second-order Møller-Plesset perturbation theory (MP2). These orbital-optimized theories are referred to as OO-CCD and OO-MP2 (or simply "OD" and "OMP2" for short), respectively. We also present an improved algorithm for orbital optimization in these methods. Explicit equations for response density matrices, the MO gradient, and the MO Hessian are reported both in spin-orbital and closed-shell spin-adapted forms. The Newton-Raphson algorithm is used for the optimization procedure using the MO gradient and Hessian. Further, orbital stability analyses are also carried out at correlated levels. The OD and OMP2 approaches are compared with the standard MP2, CCD, CCSD, and CCSD(T) methods. All these methods are applied to H(2)O, three diatomics, and the O(4)(+) molecule. Results demonstrate that the CCSD and OD methods give nearly identical results for H(2)O and diatomics; however, in symmetry-breaking problems as exemplified by O(4)(+), the OD method provides better results for vibrational frequencies. The OD method has further advantages over CCSD: its analytic gradients are easier to compute since there is no need to solve the coupled-perturbed equations for the orbital response, the computation of one-electron properties are easier because there is no response contribution to the particle density matrices, the variational optimized orbitals can be readily extended to allow inactive orbitals, it avoids spurious second-order poles in its response function, and its transition dipole moments are gauge invariant. The OMP2 has these same advantages over canonical MP2, making it promising for excited state properties via linear response theory. The quadratically convergent orbital-optimization procedure converges quickly for OMP2, and provides molecular properties that are somewhat different than those of MP2 for most of the test cases considered (although they are similar for H(2)O). Bond lengths are somewhat longer, and vibrational frequencies somewhat smaller, for OMP2 compared to MP2. In the difficult case of O(4)(+), results for several vibrational frequencies are significantly improved in going from MP2 to OMP2. PMID:21932872

Bozkaya, U?ur; Turney, Justin M; Yamaguchi, Yukio; Schaefer, Henry F; Sherrill, C David

2011-09-14

418

This article surveys the state of the art in quantum computer algorithms, including both black-box and non-black-box results. It is infeasible to detail all the known quantum algorithms, so a representative sample is given. This includes a summary of the early quantum algorithms, a description of the Abelian Hidden Subgroup algorithms (including Shor's factoring and discrete logarithm algorithms), quantum searching and amplitude amplification, quantum algorithms for simulating quantum mechanical systems, several non-trivial generalizations of the Abelian Hidden Subgroup Problem (and related techniques), the quantum walk paradigm for quantum algorithms, the paradigm of adiabatic algorithms, a family of ``topological'' algorithms, and algorithms for quantum tasks which cannot be done by a classical computer, followed by a discussion.

Michele Mosca

2008-08-04

419

The Sloan Nearby Cluster Weak Lensing Survey

We describe and present initial results of a weak lensing survey of nearby (z {approx}< 0.1) galaxy clusters in the Sloan Digital Sky Survey (SDSS). In this first study, galaxy clusters are selected from the SDSS spectroscopic galaxy cluster catalogs of Miller et al. and Berlind et al. We report a total of seven individual low-redshift cluster weak lensing measurements that include A2048, A1767, A2244, A1066, A2199, and two clusters specifically identified with the C4 algorithm. Our program of weak lensing of nearby galaxy clusters in the SDSS will eventually reach {approx}200 clusters, making it the largest weak lensing survey of individual galaxy clusters to date.

Kubo, Jeffrey M.; /Fermilab; Annis, James T.; /Fermilab; Hardin, Frances Mei; /Illinois Math. Sci. Acad.; Kubik, Donna; /Fermilab; Lawhorn, Kelsey; /Illinois Math. Sci. Acad.; Lin, Huan; /Fermilab; Nicklaus, Liana; /Illinois Math. Sci. Acad.; Nelson, Dylan; /UC, Berkeley; Reis, Ribamar Rondon de Rezende; /Fermilab; Seo, Hee-Jong; /Fermilab; Soares-Santos, Marcelle; /Fermilab /Inst. Geo. Astron., Havana /Sao Paulo U. /Fermilab

2009-08-01

420

A complex networks approach for data clustering

NASA Astrophysics Data System (ADS)

This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate.

de Arruda, Guilherme F.; Costa, Luciano da Fontoura; Rodrigues, Francisco A.

2012-12-01

421

Geometric Clustering using the Information Bottleneck method

Princeton Unversity, Princeton, NJ 08544 wbialek@princeton.edu Lâ?? eon Bottou NEC Laboratories America 4 annealing algorithms for geoÂ metric clustering can be derived from the more general Information Bot, be measured with the Euclidean norm, and then we could ask for a clustering of the points 1 {x i }, i = 1, 2

Bottou, LÃ©on

422

Geometric Clustering using the Information Bottleneck method

Princeton Unversity, Princeton, NJ 08544 wbialek@princeton.edu LÂ´eon Bottou NEC Laboratories America 4 annealing algorithms for geo- metric clustering can be derived from the more general Information Bot, be measured with the Euclidean norm, and then we could ask for a clustering of the points1 {xi}, i = 1, 2

Bottou, LÃ©on

423

Hierarchical speaker identification using speaker clustering

We explore an approach to speaker identification called speaker clustering in the GMM-based speaker recognition system in order to reduce the computational complexity. The ISODATA algorithm adapted for our purpose works well when we cluster speakers whose acoustic characteristics are similar to a distance measure. The time spent on HSI (hierarchical speaker identification) is approximately 30.3 percent more than that

Bing Sunt; Wenju Liu; Qiuhai Zhong

2003-01-01

424

An adaptive affinity propagation document clustering

The standard affinity propagation clustering algorithm suffers from one limitation that it is hard to know the value of the parameter ¿preference¿ which can yield an optimal clustering solution. To overcome this limitation, in this paper we proposes an adaptive affinity propagation method. The method first finds out the range of ¿preference¿, then searches the space of ¿preference¿ to find

Yancheng He; Qingcai Chen; Xiaolong Wang; Ruifeng Xu; Xiaohua Bai; Xianjun Meng

2010-01-01

425

Identification of cancer-associated gene clusters and genes via clustering penalization

Identification of genes associated with cancer development and progression using microarray data is challenging because of the high dimensionality and cluster structure of gene expressions. Here the clusters are composed of multiple genes with coordinated biological functions and/or correlated expressions. In this article, we first propose a hybrid approach for clustering gene expressions. The hybrid approach uses both pathological pathway information and correlations of gene expressions. We propose using the group bridge, a novel clustering penalization approach, for analysis of cancer microarray data. The group bridge approach explicitly accounts for the cluster structure of gene expressions, and is capable of selecting gene clusters and genes within those selected clusters that are associated with cancer. We also develop an iterative algorithm for computing the group bridge estimator. Analysis of three cancer microarray datasets shows that the proposed approach can identify biologically meaningful gene clusters and genes within those identified clusters. PMID:20057914

Ma, Shuangge; Huang, Jian; Shen, Shihao

2009-01-01

426

A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

427

NASA Astrophysics Data System (ADS)

A knowledge-based fuzzy clustering (KBFC) MRI segmentation algorithm was proposed to obtain accurate tumor segmentation for tumor volume measurement of nasopharyngeal carcinoma (NPC). An initial segmentation was performed on T1 and contrast enhanced T1 MR images using a semi-supervised fuzzy c-means (SFCM) algorithm. Then, three types of anatomic and space knowledge--symmetry, connectivity and cluster center were used for image analysis which contributed the final tumor segmentation. After the segmentation, tumor volume was obtained by multi-planimetry method. Visual and quantitative validations were performed on phantom model and six data volumes of NPC patients, compared with ground truth (GT) and the results acquired using seeds growing (SG) for tumor segmentation. In visual format, KBFC showed better tumor segmentation image than SG. In quantitative segmentation quality estimation, on phantom model, the matching percent (MP) / correspondence ratio (CR) was 94.1-96.4% / 0.888-0.925 for KBFC and 94.1-96.0% / 0.884-0.918 for SG while on patient data volumes, it was 92.1+/- 2.6% / 0.884+/- 0.014 for KBFC and 87.4+/- 4.3% / 0.843+/- 0.041 for SG. In tumor volume measurement, on phantom model, measurement error was 4.2-5.0% for KBFC and 4.8-6.1% for SG while on patient data volumes, it was 6.6+/- 3.5% for KBFC and 8.8+/- 5.4% for SG. Based on these results, KBFC could provide high quality of MRI tumor segmentation for tumor volume measurement of NPC.

Zhou, Jiayin; Lim, Tuan-Kay; Chong, Vincent

2002-05-01

428

In this study, we present a new architecture of a granular neural network and provide a comprehensive design methodology as well as elaborate on an algorithmic setup supporting its development. The proposed neural network relates to a broad category of radial basis function neural networks (RBFNNs) in the sense that its topology involves a collection of receptive fields. In contrast to the standard architectures encountered in RBFNNs, here we form individual receptive fields in subspaces of the original input space rather than in the entire input space. These subspaces could be different for different receptive fields. The architecture of the network is fully reflective of the structure encountered in the training data which are granulated with the aid of clustering techniques. More specifically, the output space is granulated with use of K-means clustering while the information granules in the multidimensional input space are formed by using the so-called context-based fuzzy C-means, which takes into account the structure being already formed in the output space. The innovative development facet of the network involves a dynamic reduction of dimensionality of the input space in which the information granules are formed in the subspace of the overall input space which is formed by selecting a suitable subset of input variables so that this subspace retains the structure of the entire space. As this search is of combinatorial character, we use the technique of genetic optimization [genetic algorithms (GAs), to be more specific] to determine the optimal input subspaces. A series of numeric studies exploiting synthetic data and data coming from the Machine Learning Repository, University of California at Irvine, provide a detailed insight into the nature of the algorithm and its parameters as well as offer some comparative analysis. PMID:19674950

Park, Ho-Sung; Pedrycz, Witold; Oh, Sung-Kwun

2009-10-01

429

\\u000a Industrial clusters, specially export-oriented clusters are rather new and emerging strategies for companies and countries\\u000a to achieve export development throughout the world. According to Porter (1998) in his well known paper, clusters and the new\\u000a economics of competition, “paradoxically, the enduring competitive advantages in a global economy lie increasingly in local\\u000a things – knowledge, relationships, and motivation that distant rivals

Seyed Vahid Moosavi; Mahdi Noorizadegan

430

Mesoscale and clusters of synchrony in networks of bursting neurons

NASA Astrophysics Data System (ADS)

We study the role of network architecture in the formation of synchronous clusters in synaptically coupled networks of bursting neurons. We give a simple combinatorial algorithm that finds the largest synchronous clusters from the network topology. We demonstrate that networks with a certain degree of internal symmetries are likely to have cluster decompositions with relatively large clusters, leading potentially to cluster synchronization at the mesoscale network level. We also address the asymptotic stability of cluster synchronization in excitatory networks of Hindmarsh-Rose bursting neurons and derive explicit thresholds for the coupling strength that guarantees stable cluster synchronization.

Belykh, Igor; Hasler, Martin

2011-03-01

431

[Cluster analysis and its application].

The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored. PMID:19569578

P?lpán, Zden?k

2002-01-01

432

An automatic method for estimating the content of intramuscular fat (IMF) in beef M. longissimus dorsi (LD) was developed using a sequence of image processing algorithm. To extract IMF particles within the LD muscle from structural features of intermuscular fat surrounding the muscle, three steps of image processing algorithm were developed, i.e. bilateral filter for noise removal, kernel fuzzy c-means

Cheng-Jin Du; Da-Wen Sun; Patrick Jackman; Paul Allen

2008-01-01

433

The Weighted Combined Algorithm: A Linkage Algorithm for Software Clustering

Software systems need to evolve as business requirements, technology and environment change. As software is modified to accommodate the required changes, its structure deteriorates. There is increased deviation from the actual design and architecture. Very often, documentation is not updated to reflect these changes thus making it more and more difficult to understand, manage and maintain these systems. Researchers have

Onaiza Maqbool; Haroon A. Babri

2004-01-01

434

Stereotyping: improving particle swarm performance with cluster analysis

Individuals in the particle swarm population were “stereotyped” by cluster analysis of their previous best positions. The cluster centers then were substituted for the individuals' and neighbors' best previous positions in the algorithm. The experiments, which were inspired by the social-psychological metaphor of social stereotyping, found that performance could be generally improved by substituting individuals', but not neighbors', cluster centers

James Kennedy

2000-01-01

435

On clustering heterogeneous social media objects with outlier links

The clustering of social media objects provides intrinsic understanding of the similarity relationships between documents, images, and their contextual sources. Both content and link structure provide important cues for an effective clustering algorithm of the underlying objects. While link information provides useful hints for improving the clustering process, it also contains a significant amount of noisy information. Therefore, a robust

Guo-Jun Qi; Charu C. Aggarwal; Thomas S. Huang

2012-01-01

436

Clustering Images Using the Latent Dirichlet Allocation Model

, in simple words, is grouping similar data items together. In the text domain, clustering is largely popular and fairly successful. In this work, we try and apply clustering methods that are used in the text domain collections using text matching and relevance algorithms, cluster large collections of text, and use machine

Dyer, Charles R.

437

A Face Annotation Framework with Partial Clustering and Interactive Labeling

Face annotation technology is important for a photo management system. In this paper, we propose a novel interactive face annotation framework combining unsuper- vised and interactive learning. There are two main con- tributions in our framework. In the unsupervised stage, a partial clustering algorithm is proposed to find the most ev- ident clusters instead of grouping all instances into clusters,

Yuandong Tian; Wei Liu; Rong Xiao; Fang Wen; Xiaoou Tang

2007-01-01

438

When is Constrained Clustering Beneficial, and Why?

NASA Technical Reports Server (NTRS)

Several researchers have shown that constraints can improve the results of a variety of clustering algorithms. However, there can be a large variation in this improvement, even for a fixed number of constraints for a given data set. We present the first attempt to provide insight into this phenomenon by characterizing two constraint set properties: informativeness and coherence. We show that these measures can help explain why some constraint sets are more beneficial to clustering algorithms than others. Since they can be computed prior to clustering, these measures can aid in deciding which constraints to use in practice.

Wagstaff, Kiri L.; Basu, Sugato; Davidson, Ian

2006-01-01

439

Algorithms for Gene Clustering Analysis on Genomes

The increased availability of data in biological databases provides many opportunities for understanding biological processes through these data. As recent attention has shifted from sequence analysis to higher-level analysis of genes across...

Yi, Gang Man

2012-07-16

440

Computing and Informatics, Vol. 31, 2012, 15331555 DOCUMENT CLUSTERING WITH BURSTY

mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text in [9], He et al. used Kleinberg's algorithm to extract burst detection for topic clustering on text

Zaki, Mohammed Javeed

441

A Poisson-based adaptive affinity propagation clustering for SAGE data

Serial analysis of gene expression (SAGE) is a powerful tool to obtain gene expression profiles. Clustering analysis is a valuable technique for analyzing SAGE data. In this paper, we propose an adaptive clustering method for SAGE data analysis, namely, PoissonAPS. The method incorporates a novel clustering algorithm, Affinity Propagation (AP). While AP algorithm has demonstrated good performance on many different

DongMing Tang; Qingxin Zhu; Fan Yang

2010-01-01

442

A hierarchical trajectory clustering algorithm is presented with the goal of clustering a set of mutually exclu- sive obstacle trajectory predictions for use in a contingency based path planner for an autonomous road vehicle. This clustering algorithm improves the computational scaling of the contingency planner by limiting the total number of required contingency paths while preserving the performance advantages of

Jason Hardy; Mark Campbell

2011-01-01

443

NSDL National Science Digital Library

CSC 325. (MAT 325) Numerical Algorithms (3) Prerequisite: CSC 112 or 121, MAT 162. An introduction to the numerical algorithms fundamental to scientific computer work. Includes elementary discussion of error, polynomial interpolation, quadrature, linear systems of equations, solution of nonlinear equations and numerical solution of ordinary differential equations. The algorithmic approach and the efficient use of the computer are emphasized.

Tagliarini, Gene

2003-04-21

444

NSDL National Science Digital Library

Content prepared for the Supercomputing 2002 session on "Using Clustering Technologies in the Classroom". Contains a series of exercises for teaching parallel computing concepts through kinesthetic activities.

Gray, Paul

445

Improving Sensor Network Lifetime Through Hierarchical Multihop Clustering

In this project, we developed an adaptive multihop clustering algorithm MaxLife for sensor networks. MaxLife significantly improves sensor network lifetime by balancing energy dissipation and minimizing energy consumption at the same time. The algorithm is compared to Random and MinEnergy algorithms and shows great performance gain. Random is extended from its original design of single hop clustering in (Wendi Rabiner

Maggie X. Cheng; Xuan Gong; Scott C.-H. Huang

2009-01-01

446

A global-local approach to cluster analysis

A global to local philosophy is presented as a methodology for cluster analysis. The global algorithm is presented as an estimator of initial cluster centers, while the local algorithm is presented as a refinement procedure of the global algorithm's estimate. Global-local techniques are discussed and experimental results are presented. This research was sponsored in part by NASA Grant NAS9-12931 and

R. A. Northouse; F. R. Fromm; D. J. Eigen

1973-01-01

447

In this paper, we design and develop a unified system GE-Miner (Gene Expression Miner) to integrate cluster ensemble, text clustering and multi document summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high quality gene cluster. In our text summarization module, given a gene cluster, our Expectation Maximization (EM) based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high quality clusters and provide informative key terms for the gene clusters.

Xiaohua Hu

448

Bipartite graph partitioning and data clustering

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.

Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.

2001-05-07

449

Finding Galaxy Clusters using Voronoi Tessellations

We present an objective and automated procedure for detecting clusters of galaxies in imaging galaxy surveys. Our Voronoi Galaxy Cluster Finder (VGCF) uses galaxy positions and magnitudes to find clusters and determine their main features: size, richness and contrast above the background. The VGCF uses the Voronoi tessellation to evaluate the local density and to identify clusters as significative density fluctuations above the background. The significance threshold needs to be set by the user, but experimenting with different choices is very easy since it does not require a whole new run of the algorithm. The VGCF is non-parametric and does not smooth the data. As a consequence, clusters are identified irrispective of their shape and their identification is only slightly affected by border effects and by holes in the galaxy distribution on the sky. The algorithm is fast, and automatically assigns members to structures.

M. Ramella; W. Boschin; D. Fadda; M. Nonino

2001-01-23

450

Efficient semidefinite spectral clustering via lagrange duality.

We propose an efficient approach to semidefinite spectral clustering (SSC), which addresses the Frobenius normalization with the positive semidefinite (p.s.d.) constraint for spectral clustering. Compared with the original Frobenius norm approximation-based algorithm, the proposed algorithm can more accurately find the closest doubly stochastic approximation to the affinity matrix by considering the p.s.d. constraint. In this paper, SSC is formulated as a semidefinite programming (SDP) problem. In order to solve the high computational complexity of SDP, we present a dual algorithm based on the Lagrange dual formalization. Two versions of the proposed algorithm are proffered: one with less memory usage and the other with faster convergence rate. The proposed algorithm has much lower time complexity than that of the standard interior-point-based SDP solvers. Experimental results on both the UCI data sets and real-world image data sets demonstrate that: 1) compared with the state-of-the-art spectral clustering methods, the proposed algorithm achieves better clustering performance and 2) our algorithm is much more efficient and can solve larger-scale SSC problems than those standard interior-point SDP solvers. PMID:24951690

Yan, Yan; Shen, Chunhua; Wang, Hanzi

2014-08-01

451

Recursive hybrid algorithm for non-linear system identification using radial basis function networks

Recursive identification of non-linear systems is investigated using radial basis function networks. A novel approach is adopted which employs a hybrid clustering and least squares algorithm. The recursive clustering algorithm adjusts the centres of the radial basis function network while the recursive least squares algorithm estimates the connection weights of the network. Because these two recursive learning rules are both

S. CHEN; S. A. BILLINGS; P. M. GRANT

1992-01-01

452

NASA Astrophysics Data System (ADS)

Two mathematical models are presented for the nonlinear clustering regime of observed galaxies on the basis of interwoven fractal point sets; this multifractal description furnishes important insights into what is occurring in nonlinear clustering. It is shown that multifractality is related to the galaxy counts-in-cells, following such skewed statistical distributions as the lognormal. The model predicts the scaling properties of the mean and variance of the cell-count distribution with cell size, while furnishing insights into how the nonlinear clustering process works.

Jones, Bernard J. T.; Coles, Peter; Martinez, Vicent J.

1992-11-01

453

The majority of vertebrate protocadherin (Pcdh) genes are clustered in a single genomic locus, and this remarkable genomic organization is highly conserved from teleosts to humans. These clustered Pcdhs are differentially expressed in individual neurons, they engage in homophilic trans-interactions as multimers and they are required for diverse neurodevelopmental processes, including neurite self-avoidance. Here, we provide a concise overview of the molecular and cellular biology of clustered Pcdhs, highlighting how they generate single cell diversity in the vertebrate nervous system and how such diversity may be used in neural circuit assembly. PMID:23900538

Chen, Weisheng V.; Maniatis, Tom

2013-01-01

454

ASCALABLEHIERARCHICALALGORITHMFOR UNSUPERVISED CLUSTERING

: Unsupervised Clustering, hierarchical clustering, text mining, genomics, sparse matrices, principal directionsChapter 1 ASCALABLEHIERARCHICALALGORITHMFOR UNSUPERVISED CLUSTERING Daniel Boley Abstract Top-down hierarchical clustering can be done in a scalable way. Here we describe a scalable unsupervised clustering

Boley, Daniel

455

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

The development of a tracking algorithm for minimum ionizing particles in the calorimeter and of a clustering algorithm based on the Minimum Spanning Tree approach are described. They do not depend on information from the central tracking system. Both are important components of a particle flow algorithm currently under development.

Mader, Wolfgang F.; /Iowa U.

2005-09-12

456

Resolving the structure of interactomes with hierarchical agglomerative clustering

BACKGROUND: Graphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks. Network clustering is a valuable approach for summarizing the structure in large networks, for predicting unobserved interactions, and for predicting functional annotations. Many current clustering algorithms suffer from a common set of limitations: poor resolution of top-level clusters; over-splitting of bottom-level clusters; requirements

Yongjin Park; Joel S Bader

2011-01-01

457

Lattice-Valued Hierarchical Clustering for Analyzing Information Systems

A generalization of hierarchical clustering is proposed in which the dendrogram is replaced by clusters attached to a lattice\\u000a diagram. Hence the method is called lattice-valued hierarchical clustering. Different algorithms of lattice-valued clustering\\u000a are described with application to information systems in the form of tables studied in rough sets. A simple example is given\\u000a whereby how the concept of the

Sadaaki Miyamoto

2006-01-01

458

Studies of how galaxies and clusters of galaxies have formed have reached an interesting stage where good arguments can be adduced in favor of quite different scenarios. The author explains why he thinks the evidence for the \\

P. J. E. Peebles

1984-01-01

459

Online Clustering of Linguistic Data Princeton University Class of 2005, BSE

Independent Work Advised by Professor Moses Charikar Abstract Clustering text data online as it comes done on text clustering, it has not been fully explored. In this paper, we discuss previous methods in text clustering and then develop a single-pass text clustering algorithm designed specifically

Reyzin, Lev

460

? Abstract—Data clustering plays an important role in various fields. Data clustering approaches have been designed in recent years. This investigation aims to present data clustering algorithm to identify potential musical instruments teachers. With a total of 5125 candidates registered respectively in 9 grades of Taiwan United Music Grade Test during 2000-2008. Moreover, this study proposes a new data clustering

Cheng-Fa Tsai; Yu-Tai Su; Chiu-Yen Tsai; Chun-Yi Sung

2009-01-01

461

Characterization of Linkage-based Clustering Margareta Ackerman Shai Ben-David David Loker

Characterization of Linkage-based Clustering Margareta Ackerman Shai Ben-David David Loker D step in this direction by providing such property based characterization for the class of linkage based clustering algorithms. Linkage-based clustering is one the most commonly used and widely studied clustering

Ackerman, Margareta

462

Complementary ensemble clustering of biomedical data.

The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities. PMID:23454721

Fodeh, Samah Jamal; Brandt, Cynthia; Luong, Thai Binh; Haddad, Ali; Schultz, Martin; Murphy, Terrence; Krauthammer, Michael

2013-06-01

463

Spatial clusters in a global-dependence model.

Spatial data often possess multiple components, such as local clusters and global clustering, and these effects are not easy to be separated. In this study, we propose an approach to deal with the cases where both global clustering and local clusters exist simultaneously. The proposed method is a two-stage approach, estimating the autocorrelation by an EM algorithm and detecting the clusters by a generalized least square method. It reduces the influence of global dependence on detecting local clusters and has lower false alarms. Simulations and the sudden infant disease syndrome data of North Carolina are used to illustrate the difference between the proposed method and the spatial scan statistic. PMID:23725886

Wang, Tai-Chi; Yue, Ching-Syang Jack

2013-06-01

464

Optical Detection of Galaxy Clusters

This chapter provides an overview of past and present techniques for optical detection of galaxy clusters. It follows the progression of cluster detection techniques through time, allowing readers to understand the development of the field while explaining the variety of data and methodologies applied. Within each section we describe the datasets and algorithms used, pointing out their strengths and important limitations, especially with respect to the characterizability of the resulting catalogs. The next section provides a historical overview of pre-digital, photographic surveys that formed the basis for most cluster studies until the start of the twenty-first century. Section three describes the hybrid photo-digital surveys that created the largest current cluster catalogs. The fourth section is devoted to fully digital surveys, most specifically the Sloan Digital Sky Survey and the variety of methods used for cluster detection. We also describe smaller surveys, mostly for higher redshift systems. The fifth section gives an overview of the different algorithms used by these surveys, with an eye towards future improvements. The concluding section discusses various tests that remain to be done to fully understand any of the catalogs produced by these surveys, so that they can be compared to simulations.

Roy R. Gal

2006-01-10

465

On learning algorithms and balancing loads in Time Warp

We present, in this paper, an algorithm which integrates flow control and dynamic load balancing in order to improve the performance and stability of Time Warp. The algorithm is intended for use in a distributed memory environment such as a cluster of workstations connected by a high speed switch. Our flow control algorithm makes use of stochastic learning automata and

Myongsu Choe; Carl Tropper

1999-01-01

466

Spectral redemption in clustering sparse networks

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering. PMID:24277835

Krzakala, Florent; Moore, Cristopher; Mossel, Elchanan; Neeman, Joe; Sly, Allan; Zdeborova, Lenka; Zhang, Pan

2013-01-01

467

Joint clustering of protein interaction networks through Markov random walk.

Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis. PMID:24565376

Wang, Yijie; Qian, Xiaoning

2014-01-01

468

Efficient and Accurate Clustering for Large-Scale Genetic Mapping Veronika Strnadova

, this flood of new information presents a fundamental new challenge to genetic mapping, the process clustering algorithms they employ in the genetic marker-clustering stage of automatic genetic map. The fundamental problem of genetic map construct

California at Santa Barbara, University of

469

MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS

Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...

470

NASA Astrophysics Data System (ADS)

This study presents the pattern classification of tropical cyclone (TC) tracks over the western North Pacific (WNP) basin during the typhoon season (June through October) for 1965-2006 (total 42 years) using a fuzzy clustering method. After the fuzzy c-mean clustering algorithm to the TC trajectory interpolated into 20 segments of equivalent length, we divided the whole tracks into 7 patterns. The optimal number of the fuzzy cluster is determined by several validity measures. The classified TC track patterns represent quite different features in the recurving latitudes, genesis locations, and geographical pathways: TCs mainly forming in east-northern part of the WNP and striking Korean and Japan (C1); mainly forming in west-southern part of the WNP, traveling long pathway, and partly striking Japan (C2); mainly striking Taiwan and East China (C3); traveling near the east coast of Japan (C4); traveling the distant ocean east of Japan (C5); moving toward South China and Vietnam straightly (C6); and forming in the South China Sea (C7). Atmospheric environments related to each cluster show physically consistent with each TC track patterns. The straight track pattern is closely linked to a developed anticyclonic circulation to the north of the TC. It implies that this ridge acts as a steering flow forcing TCs to move to the northwest with a more west-oriented track. By contrast, recurving patterns occur commonly under the influence of the strong anomalous westerlies over the TC pathway but there definitely exist characteristic anomalous circulations over the mid- latitudes by pattern. Some clusters are closely related to the well-known large-scale phenomena. The C1 and C2 are highly related to the ENSO phase: The TCs in the C1 (C2) is more active during La Niña (El Niño). The TC activity in the C3 is associated with the WNP summer monsoon. The TCs in the C4 is more (less) vigorous during the easterly (westerly) phase of the stratospheric quasi-biennial oscillation. This study may be applied to the statistical-dynamic long-range forecast model of TC activity as well as the diagnostic study of TC activity.

Kim, H.; Ho, C.; Kim, J.

2008-12-01

471

Buried landmine detection using multivariate normal clustering

NASA Astrophysics Data System (ADS)

A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

Duston, Brian M.

2001-10-01

472

Text Clustering for Peer-to-Peer Networks with Probabilistic Guarantees

Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed envi- ronments. However, for highly distributed environments, such as peer-to- peer networks, current clustering algorithms fail to scale. Our algorithm for peer-to-peer clustering achieves high scalability by using a proba- bilistic approach for assigning documents to clusters. It enables a peer to compare

Odysseas Papapetrou; Wolf Siberski; Norbert Fuhr

2010-01-01

473

Fast and effective text mining using linear-time document clustering

Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase.

Bjornar Larsen; Chinatsu Aone

1999-01-01

474

Clustering memes in social media streams

The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them o...

JafariAsbagh, Mohsen; Varol, Onur; Menczer, Filippo; Flammini, Alessandro

2014-01-01

475

Abstract — This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and k-means type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Liping Jing; Michael K. Ng; Xinhua Yang; Joshua Zhexue Huang

476

We report on results of recent, high resolution hydrodynamic simulations of the formation and evolution of X-ray clusters of galaxies carried out within a cosmological framework. We employ the highly accurate piecewise parabolic method (PPM) on fixed and adaptive meshes which allow us to resolve the flow field in the intracluster gas. The excellent shock capturing and low numerical viscosity of PPM represent a substantial advance over previous