For comprehensive and current results, perform a real-time search at Science.gov.

1

An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953

Zhang, Jian; Shen, Ling

2014-01-01

2

NASA Astrophysics Data System (ADS)

Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.

Wang, Deguang; Han, Baochang; Huang, Ming

3

A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

Chahine, Firas Safwan

2012-01-01

4

Cluster Validity for the Fuzzy c-Means Clustering Algorithrm.

The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced by the algorithm. For the preselected number of cluster c, the Fc-M algorithm produces c vectors in the space in which the data lie, called cluster centers, which represent points about which the data are concentrated. It also produces for each data point c-membership values, numbers between zero and one which measure the similarity of the data points to each of the cluster centers. It is these membership values which indicate how the point is classified. They also indicate how well the point has been classified, in that values close to one indicate that the point is close to a particular center, but uniformly low memberships indicate that the point has not been classified clearly. The uniform data functional (UDF) combines the memberships in such a way as to indicate how well the data have been classified and is computed as follows. For each data point compute the ratio of its smallest membership to its largest and then compute the probability that one could obtain a smaller ratio (indicating better classification) from a clustering of a standard data set in which there is no cluster structure. These probabilities are then averaged over the data set to obtain the values of the UDF. PMID:21869049

Windham, M P

1982-04-01

5

On cluster validity for the fuzzy c-means model

Many functionals have been proposed for validation of partitions of object data produced by the fuzzy c-means (FCM) clustering algorithm. We examine the role a subtle but important parameter-the weighting exponent m of the FCM model-plays in determining the validity of FCM partitions. The functionals considered are the partition coefficient and entropy indexes of Bezdek, the Xie-Beni (1991), and extended

N. R. Pal; J. C. Bezdek

1995-01-01

6

NASA Astrophysics Data System (ADS)

Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.

Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute

2014-05-01

7

New Inter-Cluster Proximity Index for Fuzzy c-Means Clustering

NASA Astrophysics Data System (ADS)

This letter presents a new inter-cluster proximity index for fuzzy partitions obtained from the fuzzy c-means algorithm. It is defined as the average proximity of all possible pairs of clusters. The proximity of each pair of clusters is determined by the overlap and the separation of the two clusters. The former is quantified by using concepts of Fuzzy Rough sets theory and the latter by computing the distance between cluster centroids. Experimental results indicate the efficiency of the proposed index.

Li, Fan; Dai, Shijin; Liu, Qihe; Yang, Guowei

8

Automatic histogram-based fuzzy C-means clustering for remote sensing imagery

NASA Astrophysics Data System (ADS)

Fuzzy C-means (FCM) clustering has been widely used in analyzing and understanding remote sensing images. However, the conventional FCM algorithm is sensitive to initialization, and it requires estimations from expert users to determine the number of clusters. To overcome the limitations of the FCM algorithm, an automatic histogram-based fuzzy C-means (AHFCM) algorithm is presented in this paper. Our proposed algorithm has two primary steps: 1 - clustering each band of a multispectral image by calculating the slope for each point of the histogram, in two directions, and executing the FCM clustering algorithm based on specific rules, and 2 - automatic fusion of labeled images is used to initialize and determine the number of clusters in the FCM algorithm for automatic multispectral image clustering. The performance of our proposed algorithm is first tested on clustering a very high resolution aerial image for various numbers of clusters and, next, on clustering two very high resolution aerial images, a high resolution Worldview2 satellite image, a Landsat8 satellite image and an EO-1 hyperspectral image, for a constant number of clusters. The superiority of the new method is demonstrated by comparing it with the well-known methods of FCM, K-means, fast global FCM (FGFCM) and kernelized fast global FCM (KFGFCM) clustering algorithms, both quantitatively by calculating the DB, XB and SC indices and qualitatively by visualizing the cluster results.

Ghaffarian, Saman; Ghaffarian, Salar

2014-11-01

9

Particle swarm optimization of kernel-based fuzzy c-means for hyperspectral data clustering

NASA Astrophysics Data System (ADS)

Hyperspectral data classification using supervised approaches, in general, and the statistical algorithms, in particular, need high quantity and quality training data. However, these limitations, and the high dimensionality of these data, are the most important problems for using the supervised algorithms. As a solution, unsupervised or clustering algorithms can be considered to overcome these problems. One of the emerging clustering algorithms that can be used for this purpose is the kernel-based fuzzy c-means (KFCM), which has been developed by kernelizing the FCM algorithm. Nevertheless, there are some parameters that affect the efficiency of KFCM clustering of hyperspectral data. These parameters include kernel parameters, initial cluster centers, and the number of spectral bands. To address these problems, two new algorithms are developed. In these algorithms, the particle swarm optimization method is employed to optimize the KFCM with respect to these parameters. The first algorithm is designed to optimize the KFCM with respect to kernel parameters and initial cluster centers, while the second one selects the optimum discriminative subset of bands and the former parameters as well. The evaluations of the results of experiments show that the proposed algorithms are more efficient than the standard k-means and FCM algorithms for clustering hyperspectral remotely sensed data.

Niazmardi, Saeid; Naeini, Amin Alizadeh; Homayouni, Saeid; Safari, Abdolreza; Samadzadegan, Farhad

2012-01-01

10

Discretization of continuous numerical attribute is one of the important research works in the preprocessing of celestial spectrum data. For characteristic line of celestial spectrum, a soft discretization algorithm is presented by using improved fuzzy C-means clustering. Firstly, candidate fuzzy clustering centers of characteristic line are chosen by using density values of sample data, so that its anti-noise ability is improved. Secondly, parameters in the fuzzy clustering are dynamically adjusted by taking compatibility of decision table as criteria, so that optimal discretization effect of the characteristic line is achieved. In the end, experimental results effectively validate that the algorithm has higher correct recognition rate of the algorithm by using three SDSS celestial spectrum data sets of high-redshift quasars, late-type star and quasars. PMID:22827108

Zhang, Ji-fu; Li, Xin; Yang, Hai-feng

2012-05-01

11

A new liver segmentation algorithm is proposed. First, the threshold method was used to remove the ribs and spines in the initial image, and the fuzzy C-means clustering algorithm and morphological reconstruction filtering were used to segment the initial liver CT image. Then the multilayer perceptron neural network was trained by the segmentation result of initial image with the back-propagation

Yuqian Zhao; Yunlong Zan; Xiaofang Wang; Guiyuan Li

2010-01-01

12

Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering

Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering degrees based on fuzzy sets theory. Based on the KIII model, mandarin digital speech is recognized card number and so on. Mandarin digit pronunciations are all monosyllables and include some ambiguous

Freeman, Walter J.

13

Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering

Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces. PMID:22871125

2012-01-01

14

Self-organization and clustering algorithms

NASA Technical Reports Server (NTRS)

Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.

Bezdek, James C.

1991-01-01

15

Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm

NASA Astrophysics Data System (ADS)

Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.

Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.

2011-10-01

16

Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering.

Disease diagnosis based on ultrasound imaging is popular because of its non-invasive nature. However, ultrasound imaging system produces low quality images due to the presence of spackle noise and wave interferences. This shortcoming requires a considerable effort from experts to diagnose a disease from the carotid artery ultrasound images. Image segmentation is one of the techniques, which can help efficiently in diagnosing a disease from the carotid artery ultrasound images. Most of the pixels in an image are highly correlated. Considering the spatial information of surrounding pixels in the process of image segmentation may further improve the results. When data is highly correlated, one pixel may belong to more than one clusters with different degree of membership. In this paper, we present an image segmentation technique namely improved spatial fuzzy c-means and an ensemble clustering approach for carotid artery ultrasound images to identify the presence of plaque. Spatial, wavelets and gray level co-occurrence matrix (GLCM) features are extracted from carotid artery ultrasound images. Redundant and less important features are removed from the features set using genetic search process. Finally, segmentation process is performed on optimal or reduced features. Ensemble clustering with reduced feature set outperforms with respect to segmentation time as well as clustering accuracy. Intima-media thickness (IMT) is measured from the images segmented by the proposed approach. Based on IMT measured values, Multi-Layer Back-Propagation Neural Networks (MLBPNN) is used to classify the images into normal or abnormal. Experimental results show the learning capability of MLBPNN classifier and validate the effectiveness of our proposed technique. The proposed approach of segmentation and classification of carotid artery ultrasound images seems to be very useful for detection of plaque in carotid artery. PMID:22981822

Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Kim, Jin Young

2012-12-01

17

Alpha-Cut Implemented Fuzzy Clustering Algorithms and Switching Regressions

In the fuzzy c-means (FCM) clustering algorithm, almost none of the data points have a membership value of 1. Moreover, noise and outliers may cause difficulties in obtain- ing appropriate clustering results from the FCM algorithm. The embedding of FCM into switching regressions, called the fuzzy c-regressions (FCRs), still has the same drawbacks as FCM. In this paper, we propose

Miin-shen Yang; Kuo-lung Wu; June-nan Hsieh; Jian Yu

2008-01-01

18

Effect of co-operative fuzzy c-means clustering on estimates of three parameters AVA inversion

NASA Astrophysics Data System (ADS)

We determine the degree of variation of model fitness, to a true model based on amplitude variation with angle (AVA) methodology for a synthetic gas hydrate model, using co-operative fuzzy c-means clustering, constrained to a rock physics model. When a homogeneous starting model is used, with only traditional least squares optimization scheme for inversion, the variance of the parameters is found to be comparatively high. In this co-operative methodology, the output from the least squares inversion is fed as an input to the fuzzy scheme. Tests with co-operative inversion using fuzzy c-means with damped least squares technique and constraints derived from empirical relationship based on rock properties model show improved stability, model fitness and variance for all the three parameters in comparison with the standard inversion alone.

Nair, Rajesh R.; Kandpal, Suresh Ch

2010-04-01

19

NASA Astrophysics Data System (ADS)

This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW–SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures in the field in order to assist well planning. Results revealed that the high impedance location probability related to an area in the vicinity of the faults, whilst low impedance location probably could indicate zones of high permeability which indicate flow conduits. Analysis under the present study suggests that the orientation and magnitude of fractures exhibiting the main trend of NW–SE in Dezful Embayment is more susceptible to stimulation and is more likely to open for fluid flow.

Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.

2015-04-01

20

Survey of clustering algorithms

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics,

Rui Xu; Donald Wunsch II

2005-01-01

21

Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.

Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal. PMID:24196983

de Luis Balaguer, Maria A; Williams, Cranos M

2014-08-01

22

This paper presents MRI segmentation techniques to differentiate abnormal and normal tissues in Ophthalmology using fuzzy clustering algorithms. Applying the best-known fuzzy c-means (FCM) clustering algorithm, a newly proposed algorithm, called an alternative fuzzy c-mean (AFCM), was used for MRI segmentation in Ophthalmology. These unsupervised segmentation algorithms can help Ophthalmol- ogists to reduce the medical imaging noise effects originating from

Miin-Shen Yang; Yu-Jen Hu; Karen Chia-Ren Lin; Charles Chia-Lee Lin

2002-01-01

23

A cluster algorithm for graphs

A cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The MCL~algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with

S. Van Dongen

2000-01-01

24

Optimal algorithms for approximate clustering

In a clustering problem, the aim is to partition a given set of n points in d-dimensional space into k groups, called clusters, so that points within each cluster are near each other. Two objective functions frequently used to measure the performance of a clustering algorithm are, for any L4 metric, (a) the maximum distance between pairs of points in

Tomás Feder; Daniel H. Greene

1988-01-01

25

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2012-08-15

26

Comparing Clustering and Metaclustering Algorithms

\\u000a In this paper, the accuracies of four meta-clustering algorithms and five different base-clustering algorithms are compared.\\u000a These algorithms come from different knowledge areas such as statistics, neural networks and machine learning. The main advantages\\u000a of these algorithms are their adaptiveness to some specific datasets. The ensembles are based on bagging, voting and graph\\u000a partitioning. These ensembles use relabeling techniques to

Elio Lozano; Edgar Acuña

27

The influence of early diagenesis on the natural remanent magnetisation (NRM) in sediments from the Calabrian\\u000aridge (Central Mediterranean) is analysed with the help of fuzzy c-means (FCM) cluster analysis and non-linear\\u000amapping (NLM). The sediments are variably coloured: white, beige, purplish, greenish and grey layers occur with\\u000aoccasionally intercalated sapropels. The NRM acquired depends on both depositional conditions and

M. J. Dekkers; C. G. Langereis; S. P. Vriend; P. J. M. van Santvoort; G. J. de Lange

1994-01-01

28

In this paper, a combined approach of partial least squares (PLS) and fuzzy c-means (FCM) clustering for the monitoring of an activated-sludge waste-water treatment plant is presented. Their properties are also investigated. Both methods were applied together in process monitoring. PLS was used for extracting the most useful information from the control and process variables in order to predict a

Pekka Teppola; Satu-Pia Mujunen; Pentti Minkkinen

1998-01-01

29

Fast Algorithms for Projected Clustering

The clustering problem is well known in the database literature for its numerous applications in problems such as customer segmentation, classification and trend analysis. Unfortunately, all known algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the points. In such high dimensional spaces not all dimensions may be relevant to a given cluster. One

Charu C. Aggarwal; Cecilia Magdalena Procopiuc; Joel L. Wolf; Philip S. Yu; Jong Soo Park

1999-01-01

30

Cluster update algorithm and recognition

NASA Astrophysics Data System (ADS)

We present a fast and robust cluster update algorithm that is especially efficient in implementing the task of image segmentation using the method of superparamagnetic clustering. We apply it to a Potts model with spin interactions that are are defined by gray-scale differences within the image. Motivated by biological systems, we introduce the concept of neural inhibition to the Potts model realization of the segmentation problem. Including the inhibition term in the Hamiltonian results in enhanced contrast and thereby significantly improves segmentation quality. As a second benefit we can, after equilibration, directly identify the image segments as the clusters formed by the clustering algorithm. To construct a subsequent spin configuration the algorithm performs the standard steps of (i) forming clusters and of (ii) updating the spins in a cluster simultaneously. As opposed to standard algorithms, however, we share the interaction energy between the two steps. Thus, the update probabilities are not independent of the interaction energies. As a consequence, we observe an acceleration of the relaxation by a factor of 10 compared to the Swendson and Wang [Phys. Rev. Lett. 58, 86 (1987)] procedure.

von Ferber, C.; Wörgötter, F.

2000-08-01

31

Priority based pheromone algorithm for image cluster

Clustering task aims at the unsupervised classification of patterns in different groups. Clustering problem has been approached from different disciplines. Many Swarm Intelligence algorithms have been developed to solve numerical and combinatorial problems. Clustering with swarm-based algorithms especially Ant colony algorithm is have been shown better results in a variety of real world application. This paper introduces a new algorithm

T. Karthikeyan; R. Balakrishnan; U. Karthick Kumar

2012-01-01

32

Scalable fuzzy clustering algorithms

Clustering is the most typical way to group unlabeled data. Today, there are very large unlabeled data sets available. Many of these data sets are too large to fit in the memory of a typical computer. Some of these data sets are so large that they can only be treated as data streams because not all of the data can

L. O. Hall

2008-01-01

33

Matrix-based algorithms for document clustering

The clustering problem is the task of assigning each document in a collection to clusters of similar documents. The clustering process does not begin with pre-specified categories; rather it is the purpose of the clustering algorithm to discover natural categories in the collection of documents that it processes. We assume that the initial data for the clustering algorithms consists of

S. Oliveira; S. C. Seok

34

DAU StatRefresher: Clustering Algorithms

NSDL National Science Digital Library

This interactive module helps students to understand the definition of and uses for clustering algorithms. Students will learn to categorize the types of clustering algorithms, to use the minimal spanning tree and the k-means clustering algorithm, and to solve exercise problems using clustering algorithms. Each component has a detailed explanation along with quiz questions. A series of questions is presented at the end to test the students understanding of the lesson's entire concept.

35

On Spectral Clustering: Analysis and an algorithm

Despite many empirical successes of spectral clustering methods|algorithms that cluster points using eigenvectors of matrices derivedfrom the distances between the points|there are several unresolvedissues. First, there is a wide variety of algorithms thatuse the eigenvectors in slightly dierent ways. Second, many ofthese algorithms have no proof that they will actually compute areasonable clustering. In this paper, we present a simple

Andrew Y. Ng; Michael I. Jordan; Yair Weiss

2001-01-01

36

Scaling Clustering Algorithms to Large Databases

Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is

Paul S. Bradley; Usama M. Fayyad; Cory Reina

1998-01-01

37

The goal of this study was to develop an automated method to segment breast masses on dynamic contrast-enhanced (DCE) magnetic resonance (MR) scans and to evaluate its potential for estimating tumor volume on pre- and postchemotherapy images and tumor change in response to treatment. A radiologist experienced in interpreting breast MR scans defined a cuboid volume of interest (VOI) enclosing the mass in the MR volume at one time point within the sequence of DCE-MR scans. The corresponding VOIs over the entire time sequence were then automatically extracted. A new 3D VOI representing the local pharmacokinetic activities in the VOI was generated from the 4D VOI sequence by summarizing the temporal intensity enhancement curve of each voxel with its standard deviation. The method then used the fuzzy c-means (FCM) clustering algorithm followed by morphological filtering for initial mass segmentation. The initial segmentation was refined by the 3D level set (LS) method. The velocity field of the LS method was formulated in terms of the mean curvature which guaranteed the smoothness of the surface, the Sobel edge information which attracted the zero LS to the desired mass margin, and the FCM membership function which improved segmentation accuracy. The method was evaluated on 50 DCE-MR scans of 25 patients who underwent neoadjuvant chemotherapy. Each patient had pre- and postchemotherapy DCE-MR scans on a 1.5 T magnet. The in-plane pixel size ranged from 0.546 to 0.703 mm and the slice thickness ranged from 2.5 to 4.5 mm. The flip angle was 15°, repetition time ranged from 5.98 to 6.7 ms, and echo time ranged from 1.2 to 1.3 ms. Computer segmentation was applied to the coronal T1-weighted images. For comparison, the same radiologist who marked the VOI also manually segmented the mass on each slice. The performance of the automated method was quantified using an overlap measure, defined as the ratio of the intersection of the computer and the manual segmentation volumes to the manual segmentation volume. Pre- and postchemotherapy masses had overlap measures of 0.81±0.13 (mean±s.d.) and 0.71±0.22, respectively. The percentage volume reduction (PVR) estimated by computer and the radiologist were 55.5±43.0% (mean±s.d.) and 57.8±51.3%, respectively. Paired Student’s t test indicated that the difference between the mean PVRs estimated by computer and the radiologist did not reach statistical significance (p=0.641). The automated mass segmentation method may have the potential to assist physicians in monitoring volume change in breast masses in response to treatment. PMID:19994516

Shi, Jiazheng; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Hadjiiski, Lubomir M.; Helvie, Mark; Chenevert, Thomas

2009-01-01

38

The goal of this study was to develop an automated method to segment breast masses on dynamic contrast-enhanced (DCE) magnetic resonance (MR) scans and to evaluate its potential for estimating tumor volume on pre- and postchemotherapy images and tumor change in response to treatment. A radiologist experienced in interpreting breast MR scans defined a cuboid volume of interest (VOI) enclosing the mass in the MR volume at one time point within the sequence of DCE-MR scans. The corresponding VOIs over the entire time sequence were then automatically extracted. A new 3D VOI representing the local pharmacokinetic activities in the VOI was generated from the 4D VOI sequence by summarizing the temporal intensity enhancement curve of each voxel with its standard deviation. The method then used the fuzzy c-means (FCM) clustering algorithm followed by morphological filtering for initial mass segmentation. The initial segmentation was refined by the 3D level set (LS) method. The velocity field of the LS method was formulated in terms of the mean curvature which guaranteed the smoothness of the surface, the Sobel edge information which attracted the zero LS to the desired mass margin, and the FCM membership function which improved segmentation accuracy. The method was evaluated on 50 DCE-MR scans of 25 patients who underwent neoadjuvant chemotherapy. Each patient had pre- and postchemotherapy DCE-MR scans on a 1.5 T magnet. The in-plane pixel size ranged from 0.546 to 0.703 mm and the slice thickness ranged from 2.5 to 4.5 mm. The flip angle was 15 degrees, repetition time ranged from 5.98 to 6.7 ms, and echo time ranged from 1.2 to 1.3 ms. Computer segmentation was applied to the coronal T1-weighted images. For comparison, the same radiologist who marked the VOI also manually segmented the mass on each slice. The performance of the automated method was quantified using an overlap measure, defined as the ratio of the intersection of the computer and the manual segmentation volumes to the manual segmentation volume. Pre- and postchemotherapy masses had overlap measures of 0.81 +/- 0.13 (mean +/- s.d.) and 0.71 +/- 0.22, respectively. The percentage volume reduction (PVR) estimated by computer and the radiologist were 55.5 +/- 43.0% (mean +/- s.d.) and 57.8 +/- 51.3%, respectively. Paired Student's t test indicated that the difference between the mean PVRs estimated by computer and the radiologist did not reach statistical significance (p = 0.641). The automated mass segmentation method may have the potential to assist physicians in monitoring volume change in breast masses in response to treatment. PMID:19994516

Shi, Jiazheng; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Hadjiiski, Lubomir M; Helvie, Mark; Chenevert, Thomas

2009-11-01

39

A FCM clustering algorithm based on Semi-supervised and Point Density Weighted

The effect of FCM depends on the samples' distribution. The optimum clustering result might be not valid for the data sets having mass shape and large discrepancy of every class specimen number. Therefore, a Semi-supervised and Point Density Weighted Fuzzy C-means clustering (SSWFCM) is proposed. This algorithm using distance-based semi-supervised learning studies the training data set and gets coefficient matrix

Xiaobin Zhang; Hui Huang; Shijing Zhang

2010-01-01

40

Robust parallel clustering algorithm for image segmentation

NASA Astrophysics Data System (ADS)

This paper describes a hierarchical parallel implementation of two clustering algorithms applied to the segmentation of multidimensional images and range images. The proposed hierarchical parallel implementation results in a fast robust segmentation algorithm that can be applied in a number of practical computer vision problems. The clustering process is divided in two basic steps. First, a fast sequential clustering algorithm performs a simple analysis of the image data, which results in a sub optimal classification of the image features. Second, the resulting clusters are analyzed using the minimum volume ellipsoid estimator. The second step is to merge the similar clusters using the number and shape of the ellipsoidal clusters that best represents the data. Both algorithms are implemented in a parallel computer architecture that speeds up the classification task. The hierarchical clustering algorithm is compared against the fuzzy k-means clustering algorithm showing that both approaches gave comparable segmentation results. The hierarchical parallel implementation is tested in synthetic multidimensional images and real range images.

Tamez-Pena, Jose G.; Perez, Arnulfo

1996-02-01

41

K-nearest neighbors clustering algorithm

NASA Astrophysics Data System (ADS)

Cluster analysis, understood as unattended method of assigning objects to groups solely on the basis of their measured characteristics, is the common method to analyze DNA microarray data. Our proposal is to classify the results of one nearest neighbors algorithm (1NN). The presented method well cope with complex, multidimensional data, where the number of groups is properly identified. The numerical experiments on benchmark microarray data shows that presented algorithm give a better results than k-means clustering.

Gauza, Dariusz; ?ukowska, Anna; Nowak, Robert

2014-11-01

42

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590

Ergen, Burhan

2014-01-01

43

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590

Ergen, Burhan

2014-01-01

44

Quantum Clustering Algorithm based on Exponent Measuring Distance

The principle advantage and shortcoming of quantum clustering algorithm (QC) is analyzed. Based on its shortcomings, an improved algorithm - exponent distance-based quantum clustering algorithm (EQDC) is produced. It improved the iterative procedure of QC algorithm and used exponent distance formula to measure the distance between data points and the cluster centers. Experimental results demonstrate that the cluster accuracy of

Zhang Yao; Wang Peng; Chen Gao-yun; Chen Dong-Dong; Ding Rui; Zhang Yan

2008-01-01

45

An architecture and algorithms for multi-run clustering

This paper addresses two main challenges for clustering which require extensive human effort: selecting appropriate parameters for an arbitrary clustering algorithm and identifying alternative clusters. We propose an architecture and a concrete system MR-CLEVER for multi-run clustering that integrates active learning with clustering algorithms. The key hypothesis of this work is that better clustering results can be obtained by combining

Rachsuda Jiamthapthaksin; Christoph F. Eick; Vadeerat Rinsurongkawong

2009-01-01

46

An algorithm for spatial heirarchy clustering

NASA Technical Reports Server (NTRS)

A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.

Dejesusparada, N. (principal investigator); Velasco, F. R. D.

1981-01-01

47

Efficient Fuzzy C-Means Architecture for Image Segmentation

This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate. PMID:22163980

Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen

2011-01-01

48

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.

Bezdek, James C.

1991-01-01

49

Hybrid Evolutionary Algorithms and Clustering Search

A challenge in hybrid evolutionary algorithms is to employ efficient strategies to cover all the search space, applying local\\u000a search only in actually promising search areas. The inspiration in nature has been pursued to design flexible, coherent, and\\u000a efficient computational models. In this chapter, the clustering search (*CS) is proposed as a generic way of combining search\\u000a metaheuristics with clustering

Alexandre C. M. Oliveira; Luiz A. N. Lorena

50

CURE: An Efficient Clustering Algorithm for Large Databases

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very frag- ile in the presence of outliers. We propose a new cluster- ing algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

1998-01-01

51

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time

Raied Salman; Vojislav Kecman; Qi Li; Robert Strack

2011-01-01

52

Cluster Algorithm Special Purpose Processor

NASA Astrophysics Data System (ADS)

We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.

Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.

53

A practical clustering algorithm for static and dynamic information organization

We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally efficient cover by dense subgraphs. We further show a lower bound on the quality of the

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1999-01-01

54

The Star Clustering Algorithm for Static and Dynamic Information Organization

Abstract We present and analyze the o - line star algorithm for clustering static information systems and the on - line star algorithm for clustering dynamic information systems These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally e cient cover by dense subgraphs We further show a lower

Javed A. Aslam; Ekaterina Pelekhov; Daniela Rus

2004-01-01

55

A Cross Unequal Clustering Routing Algorithm for Sensor Network

NASA Astrophysics Data System (ADS)

In the routing protocol for wireless sensor network, the cluster size is generally fixed in clustering routing algorithm for wireless sensor network, which can easily lead to the "hot spot" problem. Furthermore, the majority of routing algorithms barely consider the problem of long distance communication between adjacent cluster heads that brings high energy consumption. Therefore, this paper proposes a new cross unequal clustering routing algorithm based on the EEUC algorithm. In order to solve the defects of EEUC algorithm, this algorithm calculating of competition radius takes the node's position and node's remaining energy into account to make the load of cluster heads more balanced. At the same time, cluster adjacent node is applied to transport data and reduce the energy-loss of cluster heads. Simulation experiments show that, compared with LEACH and EEUC, the proposed algorithm can effectively reduce the energy-loss of cluster heads and balance the energy consumption among all nodes in the network and improve the network lifetime

Tong, Wang; Jiyi, Wu; He, Xu; Jinghua, Zhu; Munyabugingo, Charles

2013-08-01

56

ON CLUSTER VALIDITY INDEXES IN FUZZY AND HARD CLUSTERING ALGORITHMS FOR IMAGE SEGMENTATION

ON CLUSTER VALIDITY INDEXES IN FUZZY AND HARD CLUSTERING ALGORITHMS FOR IMAGE SEGMENTATION Moumen addresses the issue of assessing the quality of the clusters found by fuzzy and hard clustering algorithms. In particular, it seeks an answer to the question on how well cluster validity indexes can automatically

Farag, Aly A.

57

Dimensionality Reduction Particle Swarm Algorithm for High Dimensional Clustering

The Particle Swarm Optimization (PSO) clustering algorithm can generate more compact clustering results than the traditional K-means clustering algorithm. However, when clustering high dimensional datasets, the PSO clustering algorithm is notoriously slow because its computation cost increases exponentially with the size of the dataset dimension. Dimensionality reduction techniques offer solutions that both significantly improve the computation time, and yield reasonably accurate clustering results in high dimensional data analysis. In this paper, we introduce research that combines different dimensionality reduction techniques with the PSO clustering algorithm in order to reduce the complexity of high dimensional datasets and speed up the PSO clustering process. We report significant improvements in total runtime. Moreover, the clustering accuracy of the dimensionality reduction PSO clustering algorithm is comparable to the one that uses full dimension space.

Cui, Xiaohui [ORNL; ST Charles, Jesse Lee [ORNL; Potok, Thomas E [ORNL; Beaver, Justin M [ORNL

2008-01-01

58

A hybrid discrete Artificial Bee Colony - GRASP algorithm for clustering

This paper presents a new hybrid algorithm, which is based on the concepts of the artificial bee colony (ABC) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm is a two phase algorithm which combines an artificial bee colony optimization algorithm for the solution of the feature selection problem and a

Y. Marinakis; M. Marinaki; N. Matsatsinis

2009-01-01

59

Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

Hilbert, E. E.

1977-01-01

60

CLASSY: An adaptive maximum likelihood clustering algorithm

NASA Technical Reports Server (NTRS)

The CLASSY clustering method alternates maximum likelihood iterative techniques for estimating the parameters of a mixture distribution with an adaptive procedure for splitting, combining, and eliminating the resultant components of the mixture. The adaptive procedure is based on maximizing the fit of a mixture of multivariate normal distributions to the observed data using its first through fourth central moments. It generates estimates of the number of multivariate normal components in the mixture as well as the proportion, mean vector, and covariance matrix for each component. The basic mathematical model for CLASSY and the actual operation of the algorithm as currently implemented are described. Results of applying CLASSY to real and simulated LANDSAT data are presented and compared with those generated by the iterative self-organizing clustering system algorithm on the same data sets.

Lennington, R. K.; Rassbach, M. E. (principal investigators)

1979-01-01

61

Chaotic map clustering algorithm for EEG analysis

NASA Astrophysics Data System (ADS)

The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

Bellotti, R.; De Carlo, F.; Stramaglia, S.

2004-03-01

62

First Cluster Algorithm Special Purpose Processor

NASA Astrophysics Data System (ADS)

We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.

Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.

63

A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks

NASA Astrophysics Data System (ADS)

Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.

Xiao, Xiaoli; Li, Yang

64

Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster

NASA Astrophysics Data System (ADS)

In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.

Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah

65

ROCK: A Robust Clustering Algorithm for Categorical Attributes

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in dierent partitions. In this paper, we study clustering algorithms for data with boolean and

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

2000-01-01

66

A novel clustering algorithm inspired by membrane computing.

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264

Peng, Hong; Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng

2015-01-01

67

A Novel Clustering Algorithm Inspired by Membrane Computing

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.

Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng

2015-01-01

68

CURE: An Efficient Clustering Algorithm for Large Data sets

clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster and then

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

1998-01-01

69

Scalable Clustering Algorithms with Balancing Constraints Arindam Banerjee

and Computer Engg University of Texas at Austin ghosh@ece.utexas.edu Abstract Clustering methods for data, and populating the initial clusters with the remaining data followed by refinements. First, we show that a simple to populate and refine the clusters. The algorithm for populating the clusters is based on a generalization

Banerjee, Arindam

70

Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

NASA Technical Reports Server (NTRS)

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

71

Automatic Detection Algorithm of Fuzzy Kohonen Clustering Networks' Topological Structure

We proposed a new algorithm for automatically detecting the topological structure of Fuzzy Kohonen Clustering Networks (FKCN). Using Hong validity as the foundation of detection, the new algorithm can find out the topological structure of networks which is workable for concrete data set stably and accurately. Thus, it advanced the efficiency and accuracy of clustering analysis and achieved a better

YAN Ying; ZENG Wen-hua; JIANG Qing-shan

72

An unsupervised self-optimizing gene clustering algorithm.

We have devised a gene-clustering algorithm that is completely unsupervised in that no parameters need be set by the user, and the clustering of genes is self-optimizing to yield the set of clusters that minimizes within-cluster distance and maximizes between-cluster distance. This algorithm was implemented in Java, and tested on a randomly selected 200-gene subset of 3000 genes from cell-cycle data in S. cerevisiae. AlignACE was used to evaluate the resulting optimized cluster set for upstream cis-regulons. The optimized cluster set was found to be of comparable quality to cluster sets obtained by two established methods (complete linkage and k-means), even when provided with only a small, randomly selected subset of the data (200 vs 3000 genes), and with absolutely no supervision. MAP and specificity scores of the highest ranking motifs identified in the largest clusters were comparable. PMID:12463911

Schachter, Asher D.; Kohane, Isaac S.

2002-01-01

73

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge datasets. The first stage is a fast distance calculation using only a small portion of the data to produce the best possible location of the centers. The second stage is a slow distance calculation in which the initial centers used are taken from the first stage. The fast and slow stages represent the speed of the movement of the centers. In the slow stage, the whole dataset can be used to get the exact location of the centers. The time cost of the distance calculation for the fast stage is very low due to the small size of the training data chosen. The time cost of the distance calculation for the slow stage is also mi...

Salman, Raied; Li, Qi; Strack, Robert; Test, Erik

2011-01-01

74

A new particle swarm optimization algorithm for dynamic image clustering

In this paper, we present ACPSO a new dynamic image clustering algorithm based on particle swarm optimization. ACPSO can partition image into compact and well separated clusters without any knowledge on the real number of clusters. It uses a swarm of particles with variable number of length, which evolve dynamically using mutation operators. Experimental results on real images demonstrate that

Salima Ouadfel; Mohamed Batouche; Abdelmalik Taleb-Ahmed

2010-01-01

75

Incremental Clustering Algorithm For Earth Science Data Mining

Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.

Vatsavai, Raju [ORNL

2009-01-01

76

Probabilistic analysis of the RNN-CLINK clustering algorithm

Clustering is among the oldest techniques used in data mining applications. Typical implementations of the hierarchical agglomerative clustering methods (HACM) require an amount of O(N2)-space, when there are N data objects, making such algorithms impractical for problems involving large datasets. The well-known clustering algorithm RNN- CLINK requires only O(N)-space, but O(N3)-time in the worst case, although the average time appears

Sheau-Dong Lang; Li-Jen Mao; Wen-Lin Hsu

1999-01-01

77

Far efficient K-means clustering algorithm

Clustering in data analysis means data with similar features are grouped together within a particular valid cluster. Each cluster consists of data that are more similar among themselves and dissimilar to data of other clusters. Clustering can be viewed as an unsupervised learning concept from machine learning perspective. In this paper, we have proposed an effective method to obtain better

Bikram Keshari Mishra; Amiya Rath; Nihar Ranjan Nayak; Sagarika Swain

2012-01-01

78

Potential Function Agglomeration Clustering Algorithm for Sparse Component Analysis

In this paper, the Potential Function Agglomeration Clustering (PFAC) algorithm has been proposed for estimating the mixing matrix in underdetermined Sparse Component Analysis (SCA), wherein the number of mixtures is less than the number of the sources. In contrast to many existing SCA methods, the PFAC algorithm can accurate estimate the number of sources and the mixing matrix. The algorithm

Ye Zhang; Fei Li; Jianhua Wu

2010-01-01

79

Clustering algorithms: Sensitivity of mass determination using A 3581

NASA Astrophysics Data System (ADS)

In this paper we discuss various methods for clustering in order to determine estimates of the cluster mass, focusing on the cluster A 3581. Using virtual observatory (VO) tools, possible galaxy cluster candidates are selected. Using the Kayes Mixture Model (KMM) algorithm and the Gaussian Mixing Model (GMM), we determine the most likely cluster member candidates. We then compare the results obtained to SIMBADs method of hierarchy. The mass of A 3581 was calculated and checked with literature values. We discuss the sensitivity of the mass determination and show that the GMM provides a very robust method to determine member candidates for cluster A 3581.

Wilson, S.; Oozeer, N.; Loubser, S. I.

2013-04-01

80

Using Clustering Algorithms in Legacy Systems Remodularization

Incited by the observation that cluster analysis and the remodularization of software systems solve similar problems, we have done research in both these areas in order to provide theoretical background for the application of cluster analysis in systems remodularization. We present an overview of cluster analysis and of systems remodularization. It appears that system remodularization techniques often either reinvent clustering

T. A. Wiggerts

1997-01-01

81

Many types of clustering techniques for chemical structures have been used in the literature, but it is known that any single method will not always give the best results for all types of applications. Recent work on consensus clustering methods is motivated because of the successes of combining multiple classifiers in many areas and the ability of consensus clustering to improve the robustness, novelty, consistency and stability of individual clusterings. In this paper, the Cluster-based Similarity Partitioning Algorithm (CSPA) was examined for improving the quality of chemical structures clustering. The effectiveness of clustering was evaluated based on the ability to separate active from inactive molecules in each cluster and the results were compared with the Ward's clustering method. The chemical dataset MDL Drug Data Report (MDDR) database was used for experiments. The results, obtained by combining multiple clusterings, showed that the consensus clustering method can improve the robustness, novelty and stability of chemical structures clustering. PMID:24429501

Saeed, Faisal; Salim, Naomie; Abdo, Ammar

2014-01-01

82

A Clustering Algorithm for Recombinant Jazz Improvisations

Music, one of the most structurally analyzed forms of human creativity, provides an opportune platform for computer simulation of human artistic choice. This thesis addresses the question of how well a computer model can capture and imitate the improvisational style of a jazz soloist. How closely can improvisational style be approximated by a set of rules? Can a computer program write music that, even to the trained ear, is indistinguishable from a piece improvised by a well-known player? We discuss computer models for jazz improvisation and introduce a new system, Recombinant Improvisations from Jazz Riffs (Riff Jr.), based on Hidden Markov Models, the global structure of jazz solos, and a clustering algorithm. Our method represents improvements largely because of attention paid to the full structure of improvisations. To verify the effectiveness of our program, we tested whether listeners could tell the difference between human solos and computer improvisations. In a survey asking subjects to identify which of four solos were by Charlie Parker and which

Jonathan Gillick; Jonathan Gillick

2009-01-01

83

APPROXIMATION ALGORITHMS FOR CLUSTERING TO MINIMIZE THE SUM OF DIAMETERS

We consider the problem of partitioning the nodes of a complete edge weighted graph into {kappa} clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution that has no more than 10k clusters such the total diameter of these clusters is within a factor O(log (n/{kappa})) of the optimal value fork clusters, where n is the number of nodes in the complete graph. For any fixed {kappa}, we present an approximation algorithm that produces {kappa} clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any {rho} {ge} 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of {rho} even when the number of clusters is fixed at 3. Other results obtained include a polynomial time algorithm for the problem when the underlying graph is a tree with edge weights.

Kopp, S.; Mortveit, H.S.; Reidys, S.M.

2000-02-01

84

The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis

NASA Astrophysics Data System (ADS)

In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.

Hoshen, Joseph

1997-08-01

85

A fuzzy clustering algorithm to detect planar and quadric shapes

NASA Technical Reports Server (NTRS)

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

86

The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148

Ahmed, Zakir Hussain

2014-01-01

87

A Fast Implementation of the ISODATA Clustering Algorithm

NASA Technical Reports Server (NTRS)

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2005-01-01

88

Parallelizing the Fuzzy ARTMAP Algorithm on a Beowulf Cluster

Parallelizing the Fuzzy ARTMAP Algorithm on a Beowulf Cluster Jimmy Secretan(*), Jos´e Castro. However, the time that it takes Fuzzy ARTMAP to converge to a solution increases rapidly as the number with the match-tracking mechanism. Results run on a Beowulf cluster with a well known large database (Forrest

89

Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks

Efficient Clustering Algorithms for Self-Organizing Wireless Sensor Networks Rajesh Krishnan BBN@bu.edu Abstract Self-organization of wireless sensor networks, which involves network decomposi- tion-organization in wireless sensor networks. We first present a novel approach for message-efficient clustering, in which

Starobinski, David

90

A Fast Algorithm for Subspace Clustering by Pattern Similarity

data analysis, tar- get marketing, web usage analysis, etc. However, state- of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle datasets of thou- sands of records, which expression on a genome-wide scale. By quantifying the relative abundance of thou- sands of mRNA transcripts

Pei, Jian

91

WCA: A Weighted Clustering Algorithm for Mobile Ad Hoc Networks

Abstract: In this paper, we propose an on-demand distributed clustering algorithm for multi-hop packet radio networks. These types ofnetworks, also known as ad hoc networks, are dynamic in nature due to the mobility of nodes. The association and dissociation of nodes toand from clusters perturb the stability of the network topology, and hence a reconfiguration of the system is often

Mainak Chatterjee; Sajal K. Das; Damla Turgut

2002-01-01

92

A New Clustering Algorithm Based Upon Flocking On Complex Network

We have proposed a model based upon flocking on a complex network, and then developed two clustering algorithms on the basis of it. In the algorithms, firstly a k-nearest neighbor (knn) graph as a weighted and directed graph is produced among all data points in a dataset each of which is regarded as an agent who can move in space,

Qiang Li; Yan He; Jing-ping Jiang

2008-01-01

93

CCL: an algorithm for the efficient comparison of clusters

The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193

Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.

2013-01-01

94

A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely. PMID:24741359

Dai, Bin; Xie, Zhongyu

2014-01-01

95

Efficient Cluster Algorithm for CP(N-1) Models

Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.

B. B Beard; M. Pepe; S. Riederer; U. -J. Wiese

2006-02-14

96

Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: ?-SNMF and ? -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression. PMID:22042156

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

97

Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis

NASA Astrophysics Data System (ADS)

Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.

Trujillo San-Martin, Maite; Hlebarov, Vejen; Sadki, Mustapha

2004-11-01

98

Sampling Within k-Means Algorithm to Cluster Large Datasets

Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

2011-08-01

99

Algorithms for Gene Clustering Analysis on Genomes

multiple genomes, there is a need to develop efficient algorithms for these large-scale applications that can help us understand the functions of genes. The overall objective of my research was to develop improved methods which can automatically assign...

Yi, Gang Man

2012-07-16

100

Probabilistic analysis of the RNN-CLINK clustering algorithm

NASA Astrophysics Data System (ADS)

Clustering is among the oldest techniques used in data mining applications. Typical implementations of the hierarchical agglomerative clustering methods (HACM) require an amount of O(N2)-space, when there are N data objects, making such algorithms impractical for problems involving large datasets. The well-known clustering algorithm RNN- CLINK requires only O(N)-space, but O(N3)-time in the worst case, although the average time appears to be O(N2-log N). We provide a probabilistic interpretation of the average time complexity of the algorithm. We also report experimental results, using the randomly generated bit vectors, and using the NETNEWS articles as the input, to support our theoretical analysis.

Lang, Sheau-Dong; Mao, Li-Jen; Hsu, Wen-Lin

1999-02-01

101

A Decentralized Fuzzy C-Means-Based Energy-Efficient Routing Protocol for Wireless Sensor Networks

Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060

2014-01-01

102

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to

Martin Ester; Hans-peter Kriegel; Jörg Sander; Xiaowei Xu

1996-01-01

103

Performance Evaluation of Some Clustering Algorithms and Validity Indices

In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed indexI. Based on a relation between the indexI and the Dunn's index, a lower bound of the value of the

Ujjwal Maulik; Sanghamitra Bandyopadhyay

2002-01-01

104

A Domain Driven Mining Algorithm on Gene Sequence Clustering

Recent biological experiments argue that similar gene sequences measured by permutation of the nucleotides do not necessarily\\u000a share functional similarity. As a result, the state-of-the-art clustering algorithms by which to annotate genes with similar\\u000a function solely based on sequence composition may cause failure. The recent study of gene clustering techniques that incorporate\\u000a prior knowledge of the biological domain is deemed

Yun Xiong; Ming Chen; Yangyong Zhu

105

A gauge invariant cluster algorithm for the Ising spin glass

The frustrated Ising model in two dimensions is revisited. The frustration is quantified in terms of the number of non-trivial plaquettes which is invariant under the Nishimori gauge symmetry. The exact ground state energy is calculated using Edmond's algorithm. A novel cluster algorithm is designed which treats gauge equivalent spin glasses on equal footing and allows for efficient simulations near criticality. As a first application, the specific heat near criticality is investigated.

K. Langfeld; M. Quandt; W. Lutz; H. Reinhardt

2006-06-14

106

Two-Stage Clustering with k Means Algorithm

\\u000a \\u000a k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since the k-means depends mainly on distance calculation between all data points and the centers then the cost will be high when the\\u000a size of the dataset is big (for example more than 500MG points). We suggested a two stage algorithm to reduce the cost

Raied Salman; Vojislav Kecman; Qi Li; Robert Strack

107

a spatial data mining method to this task and integrate with a segmentation method to identify significant reflects the light in a different way as compared to other surfaces, we relied once again, on data-mining to the skin-color segmentation problem. We address the problem of identifying skin-color and we adapt

Chahir, Youssef

108

An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks

An Energy Efficient Hierarchical Clustering Algorithm for Wireless Sensor Networks Seema (approximately 1 cubic millimeter) sensors. An ad-hoc wireless network of large numbers of such inexpensive, IN, USA {seema, coyle}@ecn.purdue.edu Abstract-- A wireless network consisting of a large number

Chen, Ing-Ray

109

A weighted clustering algorithm for clarifying vehicle GPS traces

This paper presents a weighted clustering algorithm based on the physical attraction model, which improves the physical attraction model by assigning a different weight to the position points on a GPS trace for a fast convergence according to their velocity and directional changes. The physical attraction model pulls together traces that belong on the same road in response to simulated

Jing Wang; Xiaoping Rui; Xianfeng Song; Chaoling Wang; Lingli Tang; Chuanrong Li; Venkatesh Raghvan

2011-01-01

110

High-dimensional cluster analysis with the masked EM algorithm.

Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for next-generation, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a "masked EM" algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data. PMID:25149694

Kadir, Shabnam N; Goodman, Dan F M; Harris, Kenneth D

2014-11-01

111

High-dimensional cluster analysis with the Masked EM Algorithm

Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

2014-01-01

112

K-Cluster Algorithm for Automatic Discovery of Subgoals in Reinforcement Learning

Options have proven to be useful to accelerate agent's learning in many reinforcement learning tasks, determining useful subgoals is a key step for agent to create options. A K-cluster algorithm for automatic discovery of subgoals is presented in this paper. This algorithm can extract subgoals from the trajectories collected online in clustering way. The experiments show that the K-cluster algorithm

Ben-nian Wang; Yang Gao; Zhao-Qian Chen; Jun-yuan Xie; Shi-Fu Chen

2005-01-01

113

clusterMaker: a multi-algorithm clustering plugin for Cytoscape

Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager. PMID:22070249

2011-01-01

114

MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS

effective. INTRODUCTION The advent of Beowulf-style computers has brought cluster computing within the reachMECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS Jin-Ping Gwo), Beowulf Linux cluster. ABSTRACT A simple genetic algorithm (SGA) was implemented on a cluster of Linux PCs

Hoffman, Forrest M.

115

Locally Linear Embedding Clustering Algorithm for Natural Imagery

The ability to characterize the color content of natural imagery is an important application of image processing. The pixel by pixel coloring of images may be viewed naturally as points in color space, and the inherent structure and distribution of these points affords a quantization, through clustering, of the color information in the image. In this paper, we present a novel topologically driven clustering algorithm that permits segmentation of the color features in a digital image. The algorithm blends Locally Linear Embedding (LLE) and vector quantization by mapping color information to a lower dimensional space, identifying distinct color regions, and classifying pixels together based on both a proximity measure and color content. It is observed that these techniques permit a significant reduction in color resolution while maintaining the visually important features of images.

Ziegelmeier, Lori; Peterson, Chris

2012-01-01

116

Clustering with Spectral Norm and the k-means Algorithm

There has been much progress on efficient algorithms for clustering data\\u000apoints generated by a mixture of $k$ probability distributions under the\\u000aassumption that the means of the distributions are well-separated, i.e., the\\u000adistance between the means of any two distributions is at least $\\\\Omega(k)$\\u000astandard deviations. These results generally make heavy use of the generative\\u000amodel and particular properties

Amit Kumar; Ravindran Kannan

2010-01-01

117

Cluster algorithm for Monte Carlo simulations of spin ice

NASA Astrophysics Data System (ADS)

We present an algorithm for Monte Carlo simulations of a nearest-neighbor spin-ice model based on its cluster representation. To assess its performance, we estimate a relaxation time, and find that, in contrast to the Metropolis algorithm, our algorithm does not develop spin freezing. Also, to demonstrate the efficiency, we calculate the spin and charge structure factors, and observe pinch points in a high-resolution color map. We then find that Debye screening works among defects and brings about short-range correlations, and that the deconfinement transition triggered by a fugacity of defects z is dictated by a singular part of the free-energy density fs?z .

Otsuka, Hiromi

2014-12-01

118

A Fast Clustering Algorithm for Data with a Few Labeled Instances

The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.

Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua

2015-01-01

119

A New Method to Calculate Weights of Attributes in Spectral Clustering Algorithms

In order to improve the spectral clustering algorithms, there is one algorithm that when calculate the distance formula as similar matrix of spectral clustering, considering the weights of different attributes, but the methods to calculate weights of attributes are of significant limitation, affecting the clustering results. Consequently, inspired by the MIV method, we construct a method named MDIV that is

Zhendong Li; Wei Sun

2011-01-01

120

A new energy-efficient clustering algorithm for Wireless Sensor Networks

In recent years, there has been a growing interest in wireless sensor networks. One of the major issues in wireless sensor network is developing an energy-efficient clustering protocol. Hierarchical clustering algorithms are very important in increasing the network's life time. Each clustering algorithm is composed of two phases, the setup phase and steady state phase. The hot point in these

Farzad Tashtarian; A. T. Haghighat; Mohsen Tolou Honary; Hamid Shokrzadeh

2007-01-01

121

This paper presents a new memetic algorithm, which is based on the concepts of genetic algorithms (GAs), particle swarm optimization\\u000a (PSO) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed\\u000a algorithm is a two phase algorithm which combines a memetic algorithm for the solution of the feature selection problem and\\u000a a GRASP

Yannis Marinakis; Magdalene Marinaki; Nikolaos F. Matsatsinis; Constantin Zopounidis

2009-01-01

122

Clustering techniques have received attention in many fields of study such as engineering, medicine, biology and data mining. The aim of clustering is to collect data points. The K-means algorithm is one of the most common techniques used for clustering. However, the results of K-means depend on the initial state and converge to local optima. In order to overcome local

Taher Niknam; Elahe Taherian Fard; Narges Pourjafarian; Alireza Rousta

2011-01-01

123

A Simple Alternative to Jet-Clustering Algorithms

I describe a class of iterative jet algorithms that are based on maximizing a fixed function of the total 4-momentum rather than clustering of pairs of jets. I describe some of the properties of the simplest examples of this class, appropriate for jets at an $e^+e^-$ machine. These examples are sufficiently simple that many features of the jets that they define can be determined analytically with ease. The jets constructed in this way have some potentially useful properties, including a strong form of infrared safety.

Howard Georgi

2014-08-31

124

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

Bezdek, James C.

1992-01-01

125

GX-Means: A model-based divide and merge algorithm for geospatial image clustering

One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.

Vatsavai, Raju [ORNL] [ORNL; Symons, Christopher T [ORNL] [ORNL; Chandola, Varun [ORNL] [ORNL; Jun, Goo [University of Michigan] [University of Michigan

2011-01-01

126

Differential Evolution Based Fuzzy Clustering

NASA Astrophysics Data System (ADS)

In this work, two new fuzzy clustering (FC) algorithms based on Differential Evolution (DE) are proposed. Five well-known data sets viz. Iris, Wine, Glass, E. Coli and Olive Oil are used to demonstrate the effectiveness of DEFC-1 and DEFC-2. They are compared with Fuzzy C-Means (FCM) algorithm and Threshold Accepting Based Fuzzy Clustering algorithms proposed by Ravi et al., [1]. Xie-Beni index is used to arrive at the 'optimal' number of clusters. Based on the numerical experiments, we infer that, in terms of least objective function value, these variants can be used as viable alternatives to FCM algorithm.

Ravi, V.; Aggarwal, Nupur; Chauhan, Nikunj

127

jClustering, an Open Framework for the Development of 4D Clustering Algorithms

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.

2013-01-01

128

Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data

some genes unclustered. The first type is most frequently used in the literature and we restrict our1 Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data #12;2 Abstract Many clustering algorithms have been used to analyze microarray gene expression data

129

In view of the speckle noise in the SAR images, utilizing the Contourlet's advantages of multiscale, localization, directionality and anisotropy, a new SAR image fusion segmentation algorithm based on the persistence and clustering in the Contourlet domain is proposed in this paper. The algorithm captures the persistence and clustering of the Contourlet transform, which is modeled by HMT and MRF,

Yan Wu; Ping Xiao; Haitao Zong; Xin Wang; Ming Li

2009-01-01

130

Contributions to "k"-Means Clustering and Regression via Classification Algorithms

ERIC Educational Resources Information Center

The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…

Salman, Raied

2012-01-01

131

A New Nonparametric Pairwise Clustering Algorithm Based on Iterative Estimation of Distance Profiles

We present a novel pairwise clustering method. Given a proximity matrix of pairwise relations (i.e. pairwise similarity or dissimilarity estimates) between data points, our algorithm extracts the two most prominent clusters in the data set. The algorithm, which is completely nonparametric, iteratively employs a two-step trans- formation on the proximity matrix. The first step of the transformation represents each point

Shlomo Dubnov; Ran El-yaniv; Yoram Gdalyahu; Elad Schneidman; Naftali Tishby; Golan Yona

2002-01-01

132

Security clustering algorithm based on reputation in hierarchical peer-to-peer network

NASA Astrophysics Data System (ADS)

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.

Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji

2013-03-01

133

Identifying Prototypical Components in Behaviour Using Clustering Algorithms

Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts. PMID:20179763

Braun, Elke; Geurten, Bart; Egelhaaf, Martin

2010-01-01

134

Existing cluster analysis methods are reviewed and a new approach using a rank order clustering algorithm is described which is particularly relevant to the problem of machine-component group formation. A relaxation and regrouping procedure is developed whereby the basic rank order clustering method may be extended to the case where there are bottleneck machines.

J. R. KING

1980-01-01

135

A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

An important application of graph partitioning is data clustering using a graph model | the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the min-max clustering principle. The relaxed version of the optimization of

Chris H. Q. Ding; Xiaofeng He; Hongyuan Zhab; Ming Gu; Horst D. Simon

2001-01-01

136

Comparison and evaluation of network clustering algorithms applied to genetic interaction networks.

The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes; Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets; the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks. PMID:22202027

Hou, Lin; Wang, Lin; Berg, Arthur; Qian, Minping; Zhu, Yunping; Li, Fangting; Deng, Minghua

2012-01-01

137

NASA Astrophysics Data System (ADS)

In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.

Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf

2014-12-01

138

A clustering algorithm for the design of efficient vector quantizers to be followed by entropy coding is proposed. The algorithm, called entropy-constrained pairwise nearest neighbor (ECPNN), designs codebooks by merging the pair of Voronoi regions which gives the least increase in distortion for a given decrease in entropy. The algorithm can be used as an alternative to the entropy-constrained vector

Diego P. De Garrido; William A. Pearlman; Weiler A. Finamore

1995-01-01

139

. Experiments show that for very large data sets the algorithm scales nearly linearly with the increasing num technique to design fast algorithms to cluster huge data sets. In this paper we present PBIRCH algorithm is divided into p blocks of roughly equal size and distributed among the processors. Each pro- cessor inserts

Gupta, Neelima

140

An Image Classification Method Based on a SK Sub-Vector Multi-Hierarchy Clustering Algorithm

To make image databases more effectively organized, we present a SK (sequence clustering plus the K-mean clustering) sub-vector multi-hierarchy clustering algorithm in this paper. It clusters images to numbers of classes automatically, according to human perception. It utilized HSV histograms, wavelet texture features, color-texture moments, a gray gradient co-occurrence matrix, and hierarchical distribution features, to put similar semantic images into

Xianbo Lang; Guochang Gu; Hongxun Yao; Jun Ni

2007-01-01

141

Two generalizations of Kohonen clustering

NASA Technical Reports Server (NTRS)

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.

Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

1993-01-01

142

Clustering: a neural network approach.

Clustering is a fundamental data analysis method. It is widely used for pattern recognition, feature extraction, vector quantization (VQ), image segmentation, function approximation, and data mining. As an unsupervised classification technique, clustering identifies some inherent structures present in a set of objects based on a similarity measure. Clustering methods can be based on statistical model identification (McLachlan & Basford, 1988) or competitive learning. In this paper, we give a comprehensive overview of competitive learning based clustering methods. Importance is attached to a number of competitive learning based clustering neural networks such as the self-organizing map (SOM), the learning vector quantization (LVQ), the neural gas, and the ART model, and clustering algorithms such as the C-means, mountain/subtractive clustering, and fuzzy C-means (FCM) algorithms. Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering, clustering based on non-Euclidean distance measures, supervised clustering, hierarchical clustering as well as cluster validity are also described. Two examples are given to demonstrate the use of the clustering methods. PMID:19758784

Du, K-L

2010-01-01

143

An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862

Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

2014-01-01

144

A highly efficient multi-core algorithm for clustering extremely large datasets

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922

2010-01-01

145

A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease

Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478

Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.

2010-01-01

146

Block clustering based on difference of convex functions (DC) programming and DC algorithms.

We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM. PMID:23777526

Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai

2013-10-01

147

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

Deb, Suash; Yang, Xin-She

2014-01-01

148

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

2014-01-01

149

NASA Technical Reports Server (NTRS)

We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.

Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William

2006-01-01

150

K-Means Clustering Analysis Based on Genetic Algorithm

(Abstract)Traditional K-Means algorithm is sensitive to the initial centers and easy to get stuck at locally optimal value. To solve such problems, this paper presents an improved K-Means algorithm based on genetic algorithm. It combines the locally searching capability of the K-Means with the global optimization capability of genetic algorithm, and introduces the K-Means operation into the genetic algorithm of

LAI Yu-xia; LIU Jian-ping; YANG Guo-xing

2008-01-01

151

Clustering algorithms for area geographical entities in spatial data mining

Spatial data mining is the process of identifying or extracting efficient, novel, potentially useful and ultimately understandable patterns from the spatial data set, the spatial clustering analysis is one of the most important research directions in spatial data mining. Clustering criterion implied in massive data can be discovered by spatial clustering analysis method which can be used to explore deeper

Guang-xue Chen; Xiao-zhou Li; Qi-feng Chen

2010-01-01

152

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

153

Longitudinally-invariant k?-clustering algorithms for hadron-hadron collisions

We propose a version of the QCD-motivated ``k?'' jet-clustering algorithm for hadron-hadron collisions which is invariant under boosts along the beam directions. This leads to improved factorization properties and closer correspondence to experimental practice at hadron colliders. We examine alternative definitions of the resolution variables and cluster recombination scheme, and show that the algorithm can be implemented efficiently on a

S. Catani; Yu. L. Dokshitzer; Michael H Seymour; Bryan R Webber

1993-01-01

154

NASA Astrophysics Data System (ADS)

The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.

Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen

2014-08-01

155

Clusters of galaxies are the most massive objects in the Universe and mapping their location is an important astronomical problem. This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters of galaxies in large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a cluster model and a background model. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.

Jeremy Kepner; Rita Kim

2000-04-21

156

An Improved Ant-Colony Clustering Algorithm Based on the Innovational Distance Calculation Formula

Focused on the disadvantage of classical Euclidian distance in data clustering analysis, we propose an improved distance calculation formula, which describes the local compactness and global connectivity between data points. Furthermore, we improve ant-colony clustering algorithm by using the improved distance calculation formula. Theoretical analysis and experiments show that this method is more efficient and has the ability to identify

Shanfei Li; Kewei Yang; Wei Huang; Yuejin Tan

2010-01-01

157

CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets

: Algorithms. Keywords: Clustering, Categorical Data, K-partite Graph, Maximal Cliques, Data Mining 1 2005 ACM 1-59593-135-X/05/0008 ...$5.00. We summarize the dataset as a k-partite graph, and mine for k-partite maximal cliques. Unlike previous methods, Clicks mines subspace clusters. It uses

Zaki, Mohammed Javeed

158

A maximum profit coverage algorithm with application to small molecules cluster

-priori to the clustering procedure. In this paper we present CIM and model it as a maximum profit cov- erage problem (MPCPA maximum profit coverage algorithm with application to small molecules cluster identification Aviv, 69978, Israel. 1 Introduction 1.1 Problem Definition In this article we model, and analyze

Hassin, Refael

159

Clustering online social network communities using genetic algorithms Mustafa H. Hajeer*

Clustering online social network communities using genetic algorithms Mustafa H. Hajeer* Alka Singh/misinformation, help law enforcement in resource allocation in crowd management, etc. The paper presents this GA- based of online social behavior and used un-weighted social network data for clustering [2], i.e. every edge

Sanyal, Sugata

160

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Pradipta Maji of brain MR images. The RFCM algorithm comprises a judicious integration of the of rough sets, fuzzy sets with vagueness and incompleteness in class definition of brain MR images, the membership function of fuzzy sets

Pal, Sankar Kumar

161

Soft Threshold Based Cluster-Head Selection Algorithm for Wireless Sensor Networks

In recent years, many clustering algorithms have been proposed. Among them, LEACH is the most famous one. However, in LEACH, within each 1\\/p rounds, once a node has been selected as a cluster-head (CH), its threshold will be set to 0, and thus it will lose the chance to participate cluster-head selection, even if it still has enough energy. In

Rong Ding; Bing Yang; Lei Yang; Jiawei Wang

2009-01-01

162

AN INTERIOR POINT ALGORITHM FOR MINIMUM SUM-OF-SQUARES CLUSTERING

An exact algorithm is proposed for minimum sum-of-squares nonhierarchical cluster- ing, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to minimize the sum of squared distances from all points to the centroid of the cluster to which they belong. This problem is expressed as a constrained hyperbolic program

O. DU MERLE; P. HANSEN; B. JAUMARD

2000-01-01

163

Chinese Text Clustering Algorithm Based k-means

NASA Astrophysics Data System (ADS)

Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.

Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang

164

An Efficient Document Clustering Algorithm and Its Application to a Document Browser.

ERIC Educational Resources Information Center

Presents a document-clustering algorithm that uses a term frequency vector for each document in a Japanese collection to produce a hierarchy in the form of a document classification tree. Introduces an application of this algorithm to a Japanese-to-English translation-aid system. (Author/LRW)

Tanaka, Hideki; Kumano, Tadashi; Uratani, Noriyoshi; Ehara, Terumasa

1999-01-01

165

) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data performance of algorithm analysis. However, near real-time performance is a very ambitious goal since

Plaza, Antonio J.

166

Cluster Computing Architectures and Algorithms for Passive Sonar Arrays by A. George, W. Rosen

1 Cluster Computing Architectures and Algorithms for Passive Sonar Arrays by A. George, W. Rosen, experimentation, and analysis of distributed, parallel computing algorithms and architectures for autonomous sonar requires the implementation of high-element-count sonar arrays and leads to a corresponding increase

George, Alan D.

167

A Novel Clustering Algorithm Based on a Modified Model of Random Walk

We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the preset rules in the modified model. Further, this data point may be also viewed as a local control

Qiang Li; Yan He; Jing-ping Jiang

2008-01-01

168

The Applied Research of Dynamic Clustering Algorithm in Identifying Vegetable Oil Species

A more practical, efficient, fast identification for food raw materials is favorable to improve the current food security situation. In order to improve this kind of condition, this paper presents a vegetable oils discrimination based on improved K-Means algorithm and according the GC of vegetable oil. And this algorithm is improved in selecting original center of clustering so that the

Xintian Cheng; Hongmei Zhang

2009-01-01

169

A Reactive Tabu Search algorithm with variable clustering for the Unicost Set Covering Problem

We develop a Reactive Tabu Search (RTS) algorithm for solving the Unicost Set Covering Problem (SCP) - USCP-TS. We solve a Linear Programming (LP) relaxation of the problem and use the LP optimum to construct a quality solution profile. We cluster the problem variables based on this profile and partition the solution space into orbits. We tested our algorithm on

Gary W. Kinney; J. Wesley Barnes; Bruce W. Colletti

170

A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

171

An improved clustering algorithm of tunnel monitoring data for cloud computing.

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

172

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

Ju, Chunhua

2013-01-01

173

An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

174

A randomized algorithm for two-cluster partition of a set of vectors

NASA Astrophysics Data System (ADS)

A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.

Kel'manov, A. V.; Khandeev, V. I.

2015-02-01

175

An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.

Uy, D.L.

1996-02-01

176

A fast density-based clustering algorithm for real-time Internet of Things stream.

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

177

A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

Ying Wah, Teh

2014-01-01

178

A Novel Coverage-Preserving Clustering Algorithm for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Sensing coverage is one of the crucial characteristics for wireless sensor networks. It has to be considered in the design of routing protocols. LEACH (Low Energy Adaptive Cluster Hierarchy) is a significant and representative routing protocol which organizes the sensing nodes by clustering. For LEACH, residual energy should be considered in order to overcome the inequality of energy dissipation rate. Considering the impact on these two factors of a network, we have proposed a coverage-preserving energy-based clustering algorithm (CEC), which is an improved LEACH. Through improving the threshold for cluster-head selection, CEC achieved more effective results than the other baseline protocols.

Di, Xin

179

A Fast Algorithm to Cluster High Dimensional Basket Data

], [13]. The EM algorithm is a general statistical method of maximum likelihood estimation [7], [14], [17/republish this material for advertising or promotional purposes or for creating new collective works for resale algorithm builds a statistical model so that the user can understand transactions at a high level. Items

Ordonez, Carlos

180

A Community Detection Algorithm Based on Topology Potential and Spectral Clustering

Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846

Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda

2014-01-01

181

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982)

Tapas Kanungo; David M. Mount; Nathan S. Netanyahu; Christine D. Piatko; Ruth Silverman; Angela Y. Wu

2002-01-01

182

A multi-level spatial clustering algorithm for detection of disease outbreaks.

In this paper, we proposed a Multi-level Spatial Clustering (MSC) algorithm for rapid detection of emerging disease outbreaks prospectively. We used the semi-synthetic data for algorithm evaluation. We applied BARD algorithm [1] to generate outbreak counts for simulation of aerosol release of Anthrax. We compared MSC with two spatial clustering algorithms: Kulldorff's spatial scan statistic [2] and Bayesian spatial scan statistic [3]. The evaluation results showed that the areas under ROC had no significant difference among the three algorithms, so did the areas under AMOC. MSC demonstrated significant computational efficiency (100 + times faster) and higher PPV. However, MSC showed 2-6 hours delay on average for outbreak detection when the false alarm rate was lower than 1 false alarm per 4 weeks. We concluded that the MSC algorithm is computationally efficient and it is able to provide more precise and compact clusters in a timely manner while keeping high detection accuracy (cluster sensitivity) and low false alarm rates. PMID:18999304

Que, Jialan; Tsui, Fu-Chiang

2008-01-01

183

A Multi-level Spatial Clustering Algorithm for Detection of Disease Outbreaks

In this paper, we proposed a Multi-level Spatial Clustering (MSC) algorithm for rapid detection of emerging disease outbreaks prospectively. We used the semi-synthetic data for algorithm evaluation. We applied BARD algorithm[1] to generate outbreak counts for simulation of aerosol release of Anthrax. We compared MSC with two spatial clustering algorithms: Kulldorff’s spatial scan statistic[2] and Bayesian spatial scan statistic[3]. The evaluation results showed that the areas under ROC had no significant difference among the three algorithms, so did the areas under AMOC. MSC demonstrated significant computational effciency (100+ times faster) and higher PPV. However, MSC showed 2–6 hours delay on average for outbreak detection when the false alarm rate was lower than 1 false alarm per 4 weeks. We concluded that the MSC algorithm is computationally efficient and it is able to provide more precise and compact clusters in a timely manner while keeping high detection accuracy (cluster sensitivity) and low false alarm rates. PMID:18999304

Que, Jialan; Tsui, Fu-Chiang

2008-01-01

184

KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis

Subspace clustering has gained increasing popularity in the analysis of gene expression data. Among subspace cluster models, the recently introduced order-preserving sub-matrix (OPSM) has demonstrated high promise. An OPSM, essentially a pattern-based subspace cluster, is a subset of rows and columns in a data matrix for which all the rows induce the same linear ordering of columns. Existing OPSM discovery methods do not scale well to increasingly large expression datasets. In particular, twig clusters having few genes and many experiments incur explosive computational costs and are completely pruned off by existing methods. However, it is of particular interest to determine small groups of genes that are tightly coregulated across many conditions. In this paper, we present KiWi, an OPSM subspace clustering algorithm that is scalable to massive datasets, capable of discovering twig clusters and identifying negative as well as positive correlations. We extensively validate KiWi using relevant biological datase...

Griffith, Obi L; Bilenky, Mikhail; Prichyna, Yuliya; Ester, Martin; Jones, Steven J M

2009-01-01

185

Plot enchaining algorithm: a novel approach for clustering flocks of birds

NASA Astrophysics Data System (ADS)

In this study, an intuitive way for tracking flocks of birds is proposed and compared to simple cluster-seeking algorithm for real radar observations. For group of targets such as flock of birds, there is no need to track each target individually. Instead a cluster can be used to represent closely spaced tracks of a possible group. Considering a group of targets as a single target for tracking provides significant performance improvement with almost no loss of information.

Büyükaksoy Kaplan, Gülay; Lana, Adnan

2014-06-01

186

Methodologies for Comparing Clustering Algorithms in Wireless Sensor Networks

energy consumer is the radio communication, which implies the need of energy efficient solutions in order between runtime, memory-demand, communication cost as well as energy consumption. The latter is important for nodes without renewable energy resources. Apart from the complexity the quality of the clustered network

Turau, Volker

187

Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster

In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different

Zhou Rong; Ma Tianyu; Jin Yongjie

2005-01-01

188

Analytic Root Clustering: A Complete Algorithm using Soft Zero Tests

to current theories of computing in the continua is the proper treatment of the zero test. Such tests we design complete algorithms with soft zero tests? We address the basic problem of deter- mining,22]). Nevertheless, there are barriers when we address non-linear and/or non-algebraic problems. We are therefore

189

Track clustering and vertexing algorithm for L1 trigger

One of the keystones of the canceled BTeV experiment (proposed at Fermilab's Tevatron) was its sophisticated three-level trigger. The trigger was designed to reject 99.9% of light-quark background events and retain a large number of B decays. The BTeV Pixel Detector provided a 3-dimensional, high resolution tracking system to detect B signatures. The Level 1 pixel detector trigger was proposed as a two stage process, a track-segment finder and a vertex finder which analyzed every accelerator crossing. In simulations the track-segment finder stage outputs an average of 200 track-segments per accelerator crossing (2.5MHz). The vertexing stage finds vertices and associates track-segments with the vertices found. This paper proposes a novel adaptive pattern recognition model to find the number and the estimated location of vertices, and to cluster track-segments around those vertices. The track clustering and vertex finding is done in parallel. The pattern recognition model also generates the estimate of other important parameters such as the covariance matrix of the cluster vertices and the minimum distances from the tracks to the vertices needed to compute detached tracks.

Cancelo, Gustavo I.; /Fermilab

2005-10-01

190

Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state

We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.

Vallone, Giuseppe [Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, Via Panisperna 89/A, Compendio del Viminale, IT-00184 Roma (Italy); Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Donati, Gaia; Bruno, Natalia; Chiuri, Andrea [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Mataloni, Paolo [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Istituto Nazionale di Ottica (INO-CNR), L.go E. Fermi 6, IT-50125 Florence (Italy)

2010-05-15

191

MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS

A simple genetic algorithm (SGA) was implemented on a cluster of Linux PCs to search for the most likely fracture networks in a soil column. The objective is to evaluate the performance of SGAs in a distributed computing environment that is widely and inexpensively available to environmental researchers and engineers. The Beowulf computer was built out of surplus personal computers

Jin-Ping Gwo; Forrest M. Hoffman; William W. Hargrove

192

Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search hits list, returned from a search engine. In this paper we present the results of an experimental evaluation of a new algorithm named Lingo. We use Open Directory Project as a source of high-quality narrow- topic document references and mix them into several

Stanislaw Osinski; Dawid Weiss

2004-01-01

193

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F danailea@eeng.dcu.ie Abstract This paper details the implementation of a new adaptive technique for color with respect to texture and color since no local constraints are applied to impose spatial continuity

Whelan, Paul F.

194

A Cluster Warhead Projection-Time Self-Adaptive Algorithm and Simulation

This paper proposes a self-adaptive algorithm for the projection-time of the cluster warhead, which is applicable to the realization of the engineering project with low cost. On the basis of keeping the simplified correction strategy that keeps the constant distance of the projection points and through detecting the deviation of the actual trajectory data at a specific time with respect

Qiang Shen; Mian Ge; Jie Li

2009-01-01

195

MapReduce Intrusion Detection System based on a Particle Swarm Optimization Clustering Algorithm

MapReduce Intrusion Detection System based on a Particle Swarm Optimization Clustering Algorithm networks to be analyzed imposes new challenges to an intrusion detection system. Since data in computer to be done within a reasonable amount of time. Some of the past and current intrusion detection systems

Ludwig, Simone

196

An Energy-Efficient Clustering Algorithm for Multihop Data Gathering in Wireless Sensor

An Energy-Efficient Clustering Algorithm for Multihop Data Gathering in Wireless Sensor Networks1 at an unprecedented fidelity. To fully realize this vision, these networks have to be self-organizing, self- healing, economical and energy-efficient simultaneously. Since the communication task is a significant power consumer

Selvadurai, Selvakennedy

197

Scheduling of a design project is complex because design activities often have information dependencies between each other. This study proposes a network?based model to schedule design projects and generate probabilistic project durations. The proposed model applies a modified cluster identification algorithm to evaluate information dependencies between design activities to facilitate the establishment of a schedule network (and regroup activities to

2005-01-01

198

Atom-probe tomography is a materials characterization method ideally suited for the investigation of clustering and precipitation phenomena. To distinguish the clusters from the surrounding matrix, the maximum separation algorithm is widely employed. However, the results of the cluster analysis strongly depend on the parameters used in the algorithm and hence, a wrong choice of parameters leads to erroneous results, e.g., for the cluster number density, concentration, and size. Here, a new method to determine the optimum value of the parameter dmax is proposed, which relies only on information contained in the measured atom-probe data set. Atom-probe simulations are employed to verify the method and to determine the sensitivity of the maximum separation algorithm to other input parameters. In addition, simulations are used to assess the accuracy of cluster analysis in the presence of trajectory aberrations caused by the local magnification effect. In the case of Cu-rich precipitates (Cu concentration 40-60 at% and radius 0.25-1.0 nm) in a bcc Fe-Si-Cu matrix, it is shown that the error in concentration is below 10 at% and the error in radius is <0.15 nm for all simulated conditions, provided that the correct value for dmax, as determined with the newly proposed method, is employed. PMID:25327827

Jägle, Eric Aimé; Choi, Pyuck-Pa; Raabe, Dierk

2014-12-01

199

MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species

MetaCluster 4.0: A Novel Binning Algorithm for NGS Reads and Huge Number of Species YI WANG, HENRY. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning

Chin, Francis Y.L.

200

BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster

NASA Astrophysics Data System (ADS)

This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.

Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi

2007-12-01

201

MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

Melnykov, Volodymyr [University of Alabama, Tuscaloosa; Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

2012-01-01

202

A New Monte Carlo Method and Its Implications for Generalized Cluster Algorithms

We describe a novel switching algorithm based on a ``reverse'' Monte Carlo method, in which the potential is stochastically modified before the system configuration is moved. This new algorithm facilitates a generalized formulation of cluster-type Monte Carlo methods, and the generalization makes it possible to derive cluster algorithms for systems with both discrete and continuous degrees of freedom. The roughening transition in the sine-Gordon model has been studied with this method, and high-accuracy simulations for system sizes up to $1024^2$ were carried out to examine the logarithmic divergence of the surface roughness above the transition temperature, revealing clear evidence for universal scaling of the Kosterlitz-Thouless type.

C. H. Mak; Arun K. Sharma

2007-04-12

203

NASA Astrophysics Data System (ADS)

A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.

Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego

2011-12-01

204

Clustering by Fuzzy Neural Gas and Evaluation of Fuzzy Clusters

We consider some modifications of the neural gas algorithm. First, fuzzy assignments as known from fuzzy c-means and neighborhood cooperativeness as known from self-organizing maps and neural gas are combined to obtain a basic Fuzzy Neural Gas. Further, a kernel variant and a simulated annealing approach are derived. Finally, we introduce a fuzzy extension of the ConnIndex to obtain an evaluation measure for clusterings based on fuzzy vector quantization. PMID:24396342

Geweniger, Tina; Fischer, Lydia; Kaden, Marika; Lange, Mandy; Villmann, Thomas

2013-01-01

205

A Priori Data-Driven Multi-Clustered Reservoir Generation Algorithm for Echo State Network

Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296

Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo

2015-01-01

206

Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

NASA Astrophysics Data System (ADS)

With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

Choi, Woo-Yong; Chatterjee, Mainak

2015-03-01

207

A genetic algorithmic approach to antenna null-steering using a cluster computer.

NASA Astrophysics Data System (ADS)

We apply a genetic algorithm (GA) to the problem of electronically steering the maximums and nulls of an antenna array to desired positions (null toward enemy listener/jammer, max toward friendly listener/transmitter). The antenna pattern itself is computed using NEC2 which is called by the main GA program. Since a GA naturally lends itself to parallelization, this simulation was applied to our new twin 64-node cluster computers (Gemini). Design issues and uses of the Gemini cluster in our group are also discussed.

Recine, Greg; Cui, Hong-Liang

2001-06-01

208

Collaborative fuzzy clustering from multiple weighted views.

Clustering with multiview data is becoming a hot topic in data mining, pattern recognition, and machine learning. In order to realize an effective multiview clustering, two issues must be addressed, namely, how to combine the clustering result from each view and how to identify the importance of each view. In this paper, based on a newly proposed objective function which explicitly incorporates two penalty terms, a basic multiview fuzzy clustering algorithm, called collaborative fuzzy c-means (Co-FCM), is firstly proposed. It is then extended into its weighted view version, called weighted view collaborative fuzzy c-means (WV-Co-FCM), by identifying the importance of each view. The WV-Co-FCM algorithm indeed tackles the above two issues simultaneously. Its relationship with the latest multiview fuzzy clustering algorithm Collaborative Fuzzy K-Means (Co-FKM) is also revealed. Extensive experimental results on various multiview datasets indicate that the proposed WV-Co-FCM algorithm outperforms or is at least comparable to the existing state-of-the-art multitask and multiview clustering algorithms and the importance of different views of the datasets can be effectively identified. PMID:25069132

Jiang, Yizhang; Chung, Fu-Lai; Wang, Shitong; Deng, Zhaohong; Wang, Jun; Qian, Pengjiang

2015-04-01

209

A fast hierarchical clustering algorithm for large-scale protein sequence data sets.

TRIBE-MCL is a Markov clustering algorithm that operates on a graph built from pairwise similarity information of the input data. Edge weights stored in the stochastic similarity matrix are alternately fed to the two main operations, inflation and expansion, and are normalized in each main loop to maintain the probabilistic constraint. In this paper we propose an efficient implementation of the TRIBE-MCL clustering algorithm, suitable for fast and accurate grouping of protein sequences. A modified sparse matrix structure is introduced that can efficiently handle most operations of the main loop. Taking advantage of the symmetry of the similarity matrix, a fast matrix squaring formula is also introduced to facilitate the time consuming expansion. The proposed algorithm was tested on protein sequence databases like SCOP95. In terms of efficiency, the proposed solution improves execution speed by two orders of magnitude, compared to recently published efficient solutions, reducing the total runtime well below 1min in the case of the 11,944proteins of SCOP95. This improvement in computation time is reached without losing anything from the partition quality. Convergence is generally reached in approximately 50 iterations. The efficient execution enabled us to perform a thorough evaluation of classification results and to formulate recommendations regarding the choice of the algorithm?s parameter values. PMID:24657908

Szilágyi, Sándor M; Szilágyi, László

2014-05-01

210

Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII

Considering the aging effects of existing Inner Drift Chamber (IDC) of BES\\uppercase\\expandafter{\\romannumeral3}, a GEM based inner tracker is proposed to be designed and constructed as an upgrade candidate for IDC. This paper introduces a full simulation package of CGEM-IT with a simplified digitization model, describes the development of the softwares for cluster reconstruction and track fitting algorithm based on Kalman filter method for CGEM-IT. Preliminary results from the reconstruction algorithms are obtained using a Monte Carlo sample of single muon events in CGEM-IT.

Guo, Yue; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Ou-Yang, Qun; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao

2015-01-01

211

Robustness of ‘cut and splice’ genetic algorithms in the structural optimization of atomic clusters

NASA Astrophysics Data System (ADS)

We return to the geometry optimization problem of Lennard-Jones clusters to analyze the performance dependence of 'cut and splice' genetic algorithms (GAs) on the employed population size. We generally find that admixing twinning mutation moves leads to an improved robustness of the algorithm efficiency with respect to this a priori unknown technical parameter. The resulting very stable performance of the corresponding mutation + mating GA implementation over a wide range of population sizes is an important feature when addressing unknown systems with computationally involved first-principles based GA sampling.

Froltsov, Vladimir A.; Reuter, Karsten

2009-05-01

212

Fuzzy clustering algorithms have been successfully applied to POLSAR classification, but not to POLInSAR. In this paper, a Fuzzy C Means (FCM) clustering algorithm integrating the complementary physical information and statistical property contained in both polarimetric and interferometric data, is used for POLInSAR classification. At first, the area dominated by volume scattering is extracted from polarimetric information using unsupervised H-A-Alpha

Huan-Min Luo; Er-Xue Chen; Xiao-Wen Li; Jian Cheng; Min Li

2010-01-01

213

sets. 1 #12;iD jD kD iT jT kT 1 2 3 4 Figure 1: (a) Schematic of a portion of a tree where Ti and TjRandomized Algorithms for Fast Bayesian Hierarchical Clustering Katherine A. Heller and Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, London, WC1N 3AR, UK {heller

Heller, Katherine

214

Utilizing unsupervised learning to cluster data in the Bayesian data reduction algorithm

NASA Astrophysics Data System (ADS)

In this paper, unsupervised learning is utilized to illustrate the ability of the Bayesian Data Reduction Algorithm (BDRA) to cluster unlabeled training data. The BDRA is based on the assumption that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach (similar to a backward sequential feature search) for reducing irrelevant features from the training data of each class. Notice that reducing irrelevant features is synonymous here with selecting those features that provide best classification performance; the metric for making data reducing decisions is an analytic formula for the probability of error conditioned on the training data. The contribution of this work is to demonstrate how clustering performance varies depending on the method utilized for unsupervised training. To illustrate performance, results are demonstrated using simulated data. In general, the results of this work have implications for finding clusters in data mining applications.

Lynch, Robert S., Jr.; Willett, Peter K.

2005-03-01

215

NASA Astrophysics Data System (ADS)

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

216

Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.

In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies. PMID:17282575

Rong, Zhou; Tianyu, Ma; Yongjie, Jin

2005-01-01

217

Rough-fuzzy clustering for grouping functionally similar genes from microarray data.

Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets. PMID:22848138

Maji, Pradipta; Paul, Sushmita

2013-01-01

218

-size restriction at no extra I/O or com- munication cost. Experimental evidence on a Beowulf cluster shows to design and engineer algorithms that minimize disk-I/O times. These algorithms are called out is predetermined [25, 26]. In other words, the result of a comparison has no effect on the indices of elements

Cormen, Thomas H.

219

Using clustering and a modified classification algorithm for automatic text summarization

NASA Astrophysics Data System (ADS)

In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

2013-01-01

220

An improved scheduling algorithm for 3D cluster rendering with platform LSF

NASA Astrophysics Data System (ADS)

High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.

Xu, Wenli; Zhu, Yi; Zhang, Liping

2013-10-01

221

Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs. PMID:24629125

2014-01-01

222

Cloud classification from satellite data using a fuzzy sets algorithm: A polar example

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1988-01-01

223

Cloud classification from satellite data using a fuzzy sets algorithm - A polar example

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1989-01-01

224

a Multi-Core Fpga-Based 2D-CLUSTERING Algorithm for High-Throughput Data Intensive Applications

NASA Astrophysics Data System (ADS)

A multi-core FPGA-based clustering algorithm for high-throughput data intensive applications is presented. The algorithm is optimized for data with two dimensional organization (e.g. image processing, pixel detectors for high energy physics experiments etc.). It uses a moving window of generic size to adjust to the application's processing requirements (the cluster sizes and shapes that appear in the input data sets). One or more windows (cores) can be used to identify clusters in parallel, allowing for versatility to increase performance or reduce the amount of used resources. In addition to the inherent parallelism the algorithm is executed in a pipeline, thus allowing for readout to be performed in parallel with the cluster identification.

Sotiropoulou, Calliope-Louisa; Nikolaidis, Spyridon; Annovi, Alberto; Beretta, Matteo; Volpi, Guido; Giannetti, Paola; Luciano, Pierluigi

2014-06-01

225

NASA Astrophysics Data System (ADS)

Excavating cairns in southern Arabia is a way for anthropologists to understand which factors led ancient settlers to transition from a pastoral lifestyle and tribal narrative to the formation of states that exist today. Locating these monuments has traditionally been done in the field, relying on eyewitness reports and costly searches through the arid landscape. In this thesis, an algorithm for automatically detecting cairns in satellite imagery is presented. The algorithm uses a set of filters in a window based approach to eliminate background pixels and other objects that do not look like cairns. The resulting set of detected objects constitutes fewer than 0.001% of the pixels in the satellite image, and contains the objects that look the most like cairns in imagery. When a training set of cairns is available, a further reduction of this set of objects can take place, along with a likelihood-based ranking system. To aid in cairn detection, the satellite image is also clustered to determine land-form classes that tend to be consistent with the presence of cairns. Due to the large number of pixels in the image, a subsample spectral clustering algorithm called "Multiple Sample Data Spectroscopic clustering" is used. This multiple sample clustering procedure is motivated by perturbation studies on single sample spectral algorithms. The studies, presented in this thesis, show that sampling variability in the single sample approach can cause an unsatisfactory level of instability in clustering results. The multiple sample data spectroscopic clustering algorithm is intended to stabilize this perturbation by combining information from different samples. While sampling variability is still present, the use of multiple samples mitigates its effect on cluster results. Finally, a step-through of the cairn detection algorithm and satellite image clustering are given for an image in the Hadramawt region of Yemen. The top ranked detected objects are presented, and a discussion of parameter selection and future work follows.

Schuetter, Jared Michael

226

Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix

NASA Technical Reports Server (NTRS)

Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.

Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

2006-01-01

227

CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection

NASA Astrophysics Data System (ADS)

More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.

Ao, Sio-Iong

228

Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm

NASA Astrophysics Data System (ADS)

Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.

Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel

2014-05-01

229

KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.

KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505

Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C

2014-06-01

230

Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation

NASA Astrophysics Data System (ADS)

The recent and continuing construction of multi- and hyper-spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interest. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart on-board hardware is required, as well as sophisticated earth-bound processing. A segmented image is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach - a variant of the standard k-means algorithm - which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially continuous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms o of the compactness of the clusters in the spectral domain,but we have found that the spatial contiguity and spectral compactness properties are nearly 'orthogonal', which means that we can make considerable improvements in the one with minimal loss in the other.

Theiler, James P.; Gisler, Galen

1997-10-01

231

Are judgments a form of data clustering? Reexamining contrast effects with the k-means algorithm.

A number of theories have been proposed to explain in precise mathematical terms how statistical parameters and sequential properties of stimulus distributions affect category ratings. Various contextual factors such as the mean, the midrange, and the median of the stimuli; the stimulus range; the percentile rank of each stimulus; and the order of appearance have been assumed to influence judgmental contrast. A data clustering reinterpretation of judgmental relativity is offered wherein the influence of the initial choice of centroids on judgmental contrast involves 2 combined frequency and consistency tendencies. Accounts of the k-means algorithm are provided, showing good agreement with effects observed on multiple distribution shapes and with a variety of interaction effects relating to the number of stimuli, the number of response categories, and the method of skewing. Experiment 1 demonstrates that centroid initialization accounts for contrast effects obtained with stretched distributions. Experiment 2 demonstrates that the iterative convergence inherent to the k-means algorithm accounts for the contrast reduction observed across repeated blocks of trials. The concept of within-cluster variance minimization is discussed, as is the applicability of a backward k-means calculation method for inferring, from empirical data, the values of the centroids that would serve as a representation of the judgmental context. (PsycINFO Database Record PMID:25706770

Boillaud, Eric; Molina, Guylaine

2015-04-01

232

Automatic segmentation of corpus callosum using Gaussian mixture modeling and Fuzzy C means methods.

This paper presents a comparative study of the success and performance of the Gaussian mixture modeling and Fuzzy C means methods to determine the volume and cross-sectionals areas of the corpus callosum (CC) using simulated and real MR brain images. The Gaussian mixture model (GMM) utilizes weighted sum of Gaussian distributions by applying statistical decision procedures to define image classes. In the Fuzzy C means (FCM), the image classes are represented by certain membership function according to fuzziness information expressing the distance from the cluster centers. In this study, automatic segmentation for midsagittal section of the CC was achieved from simulated and real brain images. The volume of CC was obtained using sagittal sections areas. To compare the success of the methods, segmentation accuracy, Jaccard similarity and time consuming for segmentation were calculated. The results show that the GMM method resulted by a small margin in more accurate segmentation (midsagittal section segmentation accuracy 98.3% and 97.01% for GMM and FCM); however the FCM method resulted in faster segmentation than GMM. With this study, an accurate and automatic segmentation system that allows opportunity for quantitative comparison to doctors in the planning of treatment and the diagnosis of diseases affecting the size of the CC was developed. This study can be adapted to perform segmentation on other regions of the brain, thus, it can be operated as practical use in the clinic. PMID:23871683

?çer, Semra

2013-10-01

233

We propose an integrated registration and clustering algorithm, called “consistency clustering”, that automatically constructs a probabilistic white-matter atlas from a set of multi-subject diffusion weighted MR images. We formulate the atlas creation as a maximum likelihood problem which the proposed method solves using a generalized Expectation Maximization (EM) framework. Additionally, the algorithm employs an outlier rejection and denoising strategy to produce sharp probabilistic maps of certain bundles of interest. We test this algorithm on synthetic and real data, and evaluate its stability against initialization. We demonstrate labeling a novel subject using the resulting spatial atlas and evaluate the accuracy of this labeling. Consistency clustering is a viable tool for completely automatic white-matter atlas construction for sub-populations and the resulting atlas is potentially useful for making diffusion measurements in a common coordinate system to identify pathology related changes or developmental trends. PMID:20442792

Ziyan, Ulas; Sabuncu, Mert R.; Grimson, W. Eric L.

2010-01-01

234

`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny

NASA Astrophysics Data System (ADS)

Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.

Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

2010-10-01

235

NASA Astrophysics Data System (ADS)

Land use/cover (LUC) classification plays an important role in remote sensing and land change science. Because of the complexity of ground covers, LUC classification is still regarded as a difficult task. This study proposed a fusion algorithm, which uses support vector machines (SVM) and fuzzy k-means (FKM) clustering algorithms. The main scheme was divided into two steps. First, a clustering map was obtained from the original remote sensing image using FKM; simultaneously, a normalized difference vegetation index layer was extracted from the original image. Then, the classification map was generated by using an SVM classifier. Three different classification algorithms were compared, tested, and verified-parametric (maximum likelihood), nonparametric (SVM), and hybrid (unsupervised-supervised, fusion of SVM and FKM) classifiers, respectively. The proposed algorithm obtained the highest overall accuracy in our experiments.

He, Tao; Sun, Yu-Jun; Xu, Ji-De; Wang, Xue-Jun; Hu, Chang-Ru

2014-01-01

236

We present a new algorithm to search for distant clusters of galaxies on catalogues deriving from imaging data, as those of the ESO Imaging Survey. Our algorithm is a matched filter one, similar to that adopted by Postman et al. (1996), aiming at identifying cluster candidates by using positional and photometric data simultaneously. The main novelty of our approach is that spatial and luminosity filter are run separately on the catalogue and no assumption is made on the typical size nor on the typical M* for clusters, as these parameters intervene in our algorithm as typical angular scale (sigma) and typical apparent magnitude m*. Moreover we estimate the background locally for each candidate, allowing us to overcome the hazards of inhomogeneous datasets. As a consequence our algorithm has a lower contamination rate - without loss of completeness - in comparison to other techniques, as tested through extensive simulations. We provide catalogues of galaxy cluster candidates as the result of applying our algorithm to the I-band data of the EIS-wide patches A and B.

C. Lobo; A. Iovino; D. Lazzati; G. Chincarini

2000-06-29

237

Location Fingerprint Positioning Based on Interval-valued Data FCM Algorithm

NASA Astrophysics Data System (ADS)

In order to reduce positioning calculation power consumption of ZigBee module, a fingerprint positioning method was proposed in the paper based on interval-valued data fuzzy c-means algorithm. Fingerprints were regarded as interval-valued data which could reflect its uncertainty caused by measurement error and interference. In high-dimensional feature space spanned by interval midpoint and length, fingerprints were clustered by FCM algorithm to lower computation complexity. Compared with traditional clustering technologies, such as c-mean, the method got better clustering results of location fingerprints in the positioning experiment designed in the paper. Results from the clustering and positioning experiments show that the method provides a feasible solution to decrease the positioning calculation power consumption of ZigBee module remarkably, as well as ensures the positioning precision.

Li, Fang; Tong, Weiming; Wang, Tiecheng

238

NASA Astrophysics Data System (ADS)

This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, ‘Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.

Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

2015-01-01

239

Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723

Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

2014-01-01

240

NASA Astrophysics Data System (ADS)

Context. The density based spatial clustering of applications with noise (DBSCAN) is a topometric algorithm used to cluster spatial data that are affected by background noise. For the first time, we propose this method to detect sources in ?-ray astrophysical images obtained from the Fermi-LAT data, where each point corresponds to the arrival direction of a photon. Aims: We investigate the detection performance of the ?-ray DBSCAN in terms of detection efficiency and rejection of spurious clusters. Methods: We used a parametric approach, exploring a large volume of the ?-ray DBSCAN parameter space. By means of simulated data we statistically characterized the ?-ray DBSCAN, finding signatures that distinguish purely random fields from fields with sources. We defined a significance level for the detected clusters and successfully tested this significance with our simulated data. We applied the method to real data and found an excellent agreement with the results obtained with simulated data. Results.We find that the ?-ray DBSCAN can be successfully used in detecting clusters in ?-ray data. The significance returned by our algorithm is strongly correlated with that provided by the maximum likelihood analysis with standard Fermi-LAT software, and can be used to safely remove spurious clusters. The positional accuracy of the reconstructed cluster centroid compares to that returned by standard maximum likelihood analysis, allowing one to look for astrophysical counterparts in narrow regions, which minimizes the chance probability in the counterpart association. Conclusions.We found that ?-ray DBSCAN is a powerful tool for detecting of clusters in ?-ray data. It can be used to look for both point-like sources and extended sources, and can be potentially applied to any astrophysical field related to detecting clusters in data. In a companion paper we will present the application of the ?-ray DBSCAN to the full Fermi-LAT sky, discussing the potential of the algorithm to discover new sources.

Tramacere, A.; Vecchio, C.

2013-01-01

241

NASA Astrophysics Data System (ADS)

The minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to ?-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular "tree" connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, can be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitude M of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. We apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.

Campana, R.; Bernieri, E.; Massaro, E.; Tinebra, F.; Tosti, G.

2013-09-01

242

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques

L. O. Hall; A. M. Bensaid; L. P. Clarke; R. P. Velthuizen; M. S. Silbiger; J. C. Bezdek

1992-01-01

243

NASA Astrophysics Data System (ADS)

The detection of community structure in complex networks is crucial since it provides insight into the substructures of the whole network. Spectral clustering algorithms that employ the eigenvalues and eigenvectors of an appropriate input matrix have been successfully applied in this field. Despite its empirical success in community detection, spectral clustering has been criticized for its inefficiency when dealing with large scale data sets. This is confirmed by the fact that the time complexity for spectral clustering is cubic with respect to the number of instances; even the memory efficient iterative eigensolvers, such as the power method, may converge slowly to the desired solutions. In efforts to improve the complexity and performance, many non-traditional spectral clustering algorithms have been proposed. Rather than using the real eigenvalues and eigenvectors as in the traditional methods, the non-traditional clusterings employ additional topological structure information characterized by the spectrum of a matrix associated with the network involved, such as the complex eigenvalues and their corresponding complex eigenvectors, eigenspaces and semi-supervised labels. However, to the best of our knowledge, no work has been devoted to comparison among these newly developed approaches. This is the main goal of this paper, through evaluating the effectiveness of these spectral algorithms against some benchmark networks. The experimental results demonstrate that the spectral algorithm based on the eigenspaces achieves the best performance but is the slowest algorithm; the semi-supervised spectral algorithm is the fastest but its performance largely depends on the prior knowledge; and the spectral method based on the complement network shows similar performance to the conventional ones.

Ma, Xiaoke; Gao, Lin

2011-05-01

244

Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. PMID:24596179

Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi

2014-09-01

245

Anticipation versus adaptation in Evolutionary Algorithms: The case of Non-Stationary Clustering

NASA Astrophysics Data System (ADS)

From the technological point of view is usually more important to ensure the ability to react promptly to changing environmental conditions than to try to forecast them. Evolution Algorithms were proposed initially to drive the adaptation of complex systems to varying or uncertain environments. In the general setting, the adaptive-anticipatory dilemma reduces itself to the placement of the interaction with the environment in the computational schema. Adaptation consists of the estimation of the proper parameters from present data in order to react to a present environment situation. Anticipation consists of the estimation from present data in order to react to a future environment situation. This duality is expressed in the Evolutionary Computation paradigm by the precise location of the consideration of present data in the computation of the individuals fitness function. In this paper we consider several instances of Evolutionary Algorithms applied to precise problem and perform an experiment that test their response as anticipative and adaptive mechanisms. The non stationary problem considered is that of Non Stationary Clustering, more precisely the adaptive Color Quantization of image sequences. The experiment illustrates our ideas and gives some quantitative results that may support the proposition of the Evolutionary Computation paradigm for other tasks that require the interaction with a Non-Stationary environment.

González, A. I.; Graña, M.; D'Anjou, A.; Torrealdea, F. J.

1998-07-01

246

NASA Astrophysics Data System (ADS)

Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.

Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.

2014-03-01

247

NASA Technical Reports Server (NTRS)

Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.

Dasarathy, B. V.

1976-01-01

248

Reproducible Clusters from Microarray Research: Whither?

Motivation In cluster analysis, the validity of specific solutions, algorithms, and procedures present significant challenges because there is no null hypothesis to test and no 'right answer'. It has been noted that a replicable classification is not necessarily a useful one, but a useful one that characterizes some aspect of the population must be replicable. By replicable we mean reproducible across multiple samplings from the same population. Methodologists have suggested that the validity of clustering methods should be based on classifications that yield reproducible findings beyond chance levels. We used this approach to determine the performance of commonly used clustering algorithms and the degree of replicability achieved using several microarray datasets. Methods We considered four commonly used iterative partitioning algorithms (Self Organizing Maps (SOM), K-means, Clutsering LARge Applications (CLARA), and Fuzzy C-means) and evaluated their performances on 37 microarray datasets, with sample sizes ranging from 12 to 172. We assessed reproducibility of the clustering algorithm by measuring the strength of relationship between clustering outputs of subsamples of 37 datasets. Cluster stability was quantified using Cramer's v2 from a kXk table. Cramer's v2 is equivalent to the squared canonical correlation coefficient between two sets of nominal variables. Potential scores range from 0 to 1, with 1 denoting perfect reproducibility. Results All four clustering routines show increased stability with larger sample sizes. K-means and SOM showed a gradual increase in stability with increasing sample size. CLARA and Fuzzy C-means, however, yielded low stability scores until sample sizes approached 30 and then gradually increased thereafter. Average stability never exceeded 0.55 for the four clustering routines, even at a sample size of 50. These findings suggest several plausible scenarios: (1) microarray datasets lack natural clustering structure thereby producing low stability scores on all four methods; (2) the algorithms studied do not produce reliable results and/or (3) sample sizes typically used in microarray research may be too small to support derivation of reliable clustering results. Further research should be directed towards evaluating stability performances of more clustering algorithms on more datasets specially having larger sample sizes with larger numbers of clusters considered. PMID:16026595

Garge, Nikhil R; Page, Grier P; Sprague, Alan P; Gorman, Bernard S; Allison, David B

2005-01-01

249

A clustering algorithm for machine cell formation in group technology using minimum spanning trees

I address the machine cell part family formation problem in group technology. The minimum spanning tree (MST) For machines is constructed from which seeds to cluster components are generated. Seeds to cluster machines are obtained from component clusters. The process of alternate seed generation and clustering is continued until feasible solutions are obtained. Edges are removed from the MST to

G. SRINIVASAN

1994-01-01

250

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. PMID:20144735

Bhattacharya, Anindya; De, Rajat K

2010-08-01

251

Evolution Strategy for the C-Means Algorithm: Application to multimodal image

Francesco Masulli DIBRIS - Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi.massone@spin.cnr.it Andrea Schenone DIBRIS - Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi

Masulli, Francesco

252

High-performance clusters have been widely deployed to solve challenging and rigorous scientific and engineering tasks. On one hand, high performance is certainly an important consideration in designing clusters to run parallel applications. On the other hand, the ever increasing energy cost requires us to effectively conserve energy in clusters. To achieve the goal of optimizing both performance and energy efficiency

Ziliang Zong; Adam Manzanares; Xiaojun Ruan; Xiao Qin

2011-01-01

253

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

An evaluation of several clustering methods was conducted. Artificial clusters which exhibited the properties of internal cohesion and external isolation were constructed. The true cluster structure was subsequently hidden by six types of error-perturbation. The results indicated that the hierarchical methods were differentially sensitive to the type of error perturbation. In addition, generally poor recovery performance was obtained when random

Glenn W. Milligan

1980-01-01

254

An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement

ERIC Educational Resources Information Center

A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…

Debelak, Rudolf; Arendasy, Martin

2012-01-01

255

NASA Technical Reports Server (NTRS)

A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.

Lennington, R. K.; Malek, H.

1978-01-01

256

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939

Nagwani, Naresh Kumar; Deo, Shirish V.

2014-01-01

257

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939

Nagwani, Naresh Kumar; Deo, Shirish V

2014-01-01

258

In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets. PMID:21611787

Polat, Kemal

2012-08-01

259

KtJet: A C++ implementation of the Kt clustering algorithm

A C++ implementation of the Kt jet algorithm for high energy particle collisions is presented. The time performance of this implementation is comparable to the widely used Fortran implementation. Identical algorithmic functionality is provided, with a clean and intuitive user interface and additional recombination schemes. A short description of the algorithm and examples of its use are given.

J. M. Butterworth; J. P. Couchman; B. E. Cox; B. M. Waugh

2002-10-01

260

K. Daqrouq, Emad Khalaf, O.Daoud, and A. Al-Qawasmi, K-means Clustering Algorithm Identification as distinguishable classification features. As a classification method, the new approach by K-means algorithm is proposed, which uses the average of sums of point-to-centroid distances in the 1-by-K vector. To verify

261

The three-dimensional (3-D) shape of microcalcification clusters is an important indicator in early breast cancer detection. In fact, there is a relationship between the cluster topology and the type of lesion (malignant or benign). This paper presents a 3-D reconstruction method for such clusters using two 2-D views acquired during standard mammographic examinations. For this purpose, the mammographic unit was

Christian Daul; P. Graebling; A. Tiedeu; D. Wolf

2005-01-01

262

NASA Astrophysics Data System (ADS)

The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test ? coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.

Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.

2014-06-01

263

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.

Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb. PMID:22300323

Wang, Yi; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

2012-02-01

264

Background Quantitative characterization of the topological characteristics of protein-protein interaction (PPI) networks can enable the elucidation of biological functional modules. Here, we present a novel clustering methodology for PPI networks wherein the biological and topological influence of each protein on other proteins is modeled using the probability distribution that the series of interactions necessary to link a pair of distant proteins in the network occur within a time constant (the occurrence probability). Results CASCADE selects representative nodes for each cluster and iteratively refines clusters based on a combination of the occurrence probability and graph topology between every protein pair. The CASCADE approach is compared to nine competing approaches. The clusters obtained by each technique are compared for enrichment of biological function. CASCADE generates larger clusters and the clusters identified have p-values for biological function that are approximately 1000-fold better than the other methods on the yeast PPI network dataset. An important strength of CASCADE is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches which have an average discard rate of 45% on the yeast protein-protein interaction network. Conclusion CASCADE is effective at detecting biologically relevant clusters of interactions. PMID:18230159

Hwang, Woochang; Cho, Young-Rae; Zhang, Aidong; Ramanathan, Murali

2008-01-01

265

Convergence and Other Aspects of the k-modes Algorithm for Clustering Categorical data

the phenomenon of Cluster Death, where a cluster has no data vectors associated with it, are also discussed for k most frequently in each row of the vectors in X. For example, if X = car car boat bike bike plane train

266

A neural network clustering algorithm for the ATLAS silicon pixel detector

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton--proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

ATLAS collaboration

2014-06-30

267

A neural network clustering algorithm for the ATLAS silicon pixel detector

NASA Astrophysics Data System (ADS)

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton-proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

The ATLAS collaboration

2014-09-01

268

Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220

Tsai, Ming-Hui; Huang, Yueh-Min

2014-01-01

269

Using Bi-clustering Algorithm for Analyzing Online Users Activity in a Virtual Campus

Data mining algorithms have been proved to be useful for the processing of large data sets in order to extract relevant information and knowledge. Such algorithms are also important for analyzing data collected from the users' activity users. One family of such data analysis is that of mining of log files of online applications that register the actions of online

Fatos Xhafa; S. Caballe?; Leonard Barolli; Alberto Molina; Rozeta Miho

2010-01-01

270

Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering

In this paper, we explore the power of randomized algorithm to address the chal- lenge of working with very large amounts of data. We apply these algorithms to gen- erate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the num- ber of elements to be computed.

Deepak Ravichandran; Patrick Pantel; Eduard Hovy

271

This thesis examines two methods for speeding up MCNP KCODE calculations. The first approach is assembly of a low cost Beowulf Cluster for parallel computation. The first half describes the MIT Nuclear Engineering Department's ...

Carstens, Nathan, 1978-

2004-01-01

272

ClusteringMutationalSpectravia Classié cation Likelihood and Markov Chain Monte CarloAlgorithms

We have analyzed a set of 39 mutational spectra of the supF gene that were generated bydifferentmutagenicagentsandunderdifferentexperimentalconditions.Theclusteranal- yses was performed using a newly developed clustering procedure. The clustering criterion used in the procedure was developed by applying the classié cation likelihood approach to multinomial observations. We also developed a Gibbs sampling-based optimization proce- dure that outperformed previously developed methods in

Mario MEDVEDOVIC; Paul SUCCOP; Rakesh SHUKLA; Kathleen D IXON

273

Clustering mutational spectra via classification likelihood and markov chain monte carlo algorithms

We have analyzed a set of 39 mutational spectra of the supF gene that were generated by different mutagenicagents and under\\u000a different experimental conditions. The clusteranalyses was performed using a newly developed clustering procedure. The clustering\\u000a criterion used in the procedure was developed by applying the classification likelihood approach to multinomial observations.\\u000a We also developed a Gibbs sampling-based optimization procedure

Mario Medvedovic; Paul Succop; Rakesh Shukla; Kathleen Dixon

2001-01-01

274

A Robust Pose Estimation Algorithm for Mobile Robot Based on Clusters

Pose estimation is a key component of a mobile robot system. In this paper, a new pose estimation method for mobile robot\\u000a is developed based on 2D laser radar. Firstly, scan data points in each frame are divided into clusters. Then the current\\u000a scan and the previous scan are matched according to the clusters to obtain two types of match

Yuhua Xu; Chongwei Zhang; Wei Bao; Ling Su; Mulan Wang

2008-01-01

275

The concept can be used to estimate future resource requirements and to perform call admission decisions in wireless networks. Shadow clusters can be used to decide if a new call can be admitted to a wireless network based on its quality-of-service (QoS) requirements and local traffic conditions. The shadow cluster concept can especially be useful in future wireless networks with

David A. Levine; Ian F. Akyildiz; Mahmoud Naghshineh

1997-01-01

276

Point defects and pores in diamond affect its optical and electrical properties. We generated and evaluated a large number of vacancy V(n) clusters representing nanosized voids in diamonds for n up to 65. Our generational algorithm spawns the new generation n + 1 from the list of the most stable structures in the previous generation n. With energy as the only criterion, we generate a large structural diversity that allows their unbiased analysis. Since ?-electron delocalization is important for carbon, we used quantum mechanical tight-binding density functional theory (TBDFT). Adamantane-like globular shapes are preferred for n up to ?22. Beginning around n? 35, the most stable structures show overall oblate shapes with some irregularities. These novel structures have not been seen before because hitherto only highly regular structures were considered. We see local graphitization in these relaxed structures providing an atomistic justification for the widely used "slit pore" model. The preference for structures with minimum number of cut bonds diminishes as n increases. There are no particularly stable "magic" sizes for vacancy clusters larger than n = 22 indicating that these larger voids can easily incorporate small vacancies and vacancy clusters. Radial distribution analysis shows that unusual contact or bond distances in the 1.6 to 2.8 ? range appear in the vicinity of the internal surfaces of the vacancy clusters. Extremely long C-C bonds emerge as a result of structural relaxation of the dangling bonds in the vicinity of the vacancy clusters that cannot be simply described by ordinary sp(2)/sp(3) hybridization. PMID:20856969

Slepetz, Brad; Laszlo, Istvan; Gogotsi, Yury; Hyde-Volpe, David; Kertesz, Miklos

2010-11-14

277

NASA Astrophysics Data System (ADS)

The phase space obtained using the isospin quantum molecular dynamical (IQMD) model is analyzed by applying the binding energy cut in the most commonly and widely used secondary cluster recognition algorithm. In addition, for the present study, the energy contribution from momentum-dependent and symmetry potentials is also included during the calculation of total binding energy, which was absent in clusterization algorithms used earlier. The stability of fragments and isospin effects are explored by using the new clusterization algorithm. The findings are summarized as follows: (1) The clusterization algorithm identifies the fragments at quite early time. (2) It is more sensitive for free nucleons and light charged particles compared to intermediate mass fragments, which results in the enhanced (reduced) production of free nucleons (light charged particles, or LCPs). (3) It has affected the yield of isospin-sensitive observables—neutrons (n ), protons (p ), 3H,3He , and the single ratio [R (n /p )] —to a greater extent in the mid-rapidity and low kinetic energy region. In conclusion, the inclusion of the binding energy cut in the clusterization algorithm is found to play a crucial role in the study of isospin physics. This study will give another direction for the determination of symmetry energy in heavy-ion collisions at intermediate energies.

Kumar, Sanjeev; Ma, Y. G.

2015-03-01

278

Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System

NASA Astrophysics Data System (ADS)

Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.

Akhavan, P.; Karimi, M.; Pahlavani, P.

2014-10-01

279

NASA Astrophysics Data System (ADS)

The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.

Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus

2015-01-01

280

The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ. PMID:25591387

Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus

2015-01-14

281

Extraction of image semantic features with spatial-range mean shift clustering algorithm

In recent years, the Bag-of-visual Words image representation has led to many significant results in visual object recognition and categorization. However, experiments show that the unsupervised clustering of primitive visual features tends to result in the limited discriminative ability of the visual codebook, since it does not take the spatial relationship between visual primitives into consideration. This paper aims at

Mengyue Wang; Changlin Zhang; Yan Song

2010-01-01

282

-Red Imaging Spectrometer (AVIRIS), developed by NASA Jet Propulsion Laboratory, allows their exploitation applications require a response in real time, few solutions are available to provide fast and efficient of Parallelism of Barcelona, and the Thunderhead Beowulf cluster at NASA's Goddard Space Flight Center. 1

Plaza, Antonio J.

283

Sensitivity evaluation of dynamic speckle activity measurements using clustering methods

We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.

Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.

2010-07-01

284

Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC

routines on a supercomputer called Kraken, which shows that high-level programming environments, such as Pa of size 64 Ã? 32 Ã? 32 [1], Kraken, a Cray XT 5, is a 3D torus of size 25 Ã? 16 Ã? 24 [2]. In addition- abled us to implement, validate, and evaluate the algorithm on Kraken, within a few weeks of #12

Dongarra, Jack

285

DWT-CEM: an algorithm for scale-temporal clustering in fMRI

The number of studies using functional magnetic resonance imaging (fMRI) has grown very rapidly since the first description\\u000a of the technique in the early 1990s. Most published studies have utilized data analysis methods based on voxel-wise application\\u000a of general linear models (GLM). On the other hand, temporal clustering analysis (TCA) focuses on the identification of relationships\\u000a between cortical areas by

João Ricardo Sato; André Fujita; Edson Amaro Jr.; Janaina Mourão Miranda; Pedro Alberto Morettin; Michal John Brammer

2007-01-01

286

NASA Astrophysics Data System (ADS)

We implement a multiorbital cluster dynamical mean-field theory (DMFT) by improving a sample update algorithm in the continuous-time quantum Monte Carlo method based on the interaction expansion. The proposed sampling scheme for the spin-flip and pair-hopping interactions in the two-orbital systems mitigates the sign problem, giving an efficient way to deal with these interactions. In particular, in the single-site DMFT, we see that the negative signs vanish. We apply the method to the two-dimensional two-orbital Hubbard model at half-filling, where we take into account the short-range spatial correlation effects within a four-site cluster. We show that, compared to the single-site DMFT results, the critical interaction value for the metal-insulator transition decreases and that the effects of the spin-flip and pair-hopping terms are less significant in the parameter region we have studied. The present method provides a firm starting point for the study of intersite correlations in multiorbital systems. It also has a wide applicable scope in terms of realistic calculations in conjunction with density functional theory.

Nomura, Yusuke; Sakai, Shiro; Arita, Ryotaro

2014-05-01

287

The suitability of seven thresholding methods (six algorithms: isodata, Otsu, minimum error, moment-preserving, Pun and fuzzy; and a manual method) to consistently segment bread crumb images was investigated in comparison with the previously reported k-means clustering technique. Thresholding performance was assessed by two criteria: uniformity and busyness of the binary images. Crumb features (cell density, mean cell area, cell uniformity

Ursula Gonzales-Barron; Francis Butler

2006-01-01

288

when there remain a few levels of the tree to be processed. We also propose a capacity constraint top-down Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang or even get down due to late updates of the location databases. In this paper, we propose a top-down

Yang, Sung-Bong

289

Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC

performance with state-of-the-art QR routines on a supercom- puter called Kraken, which shows that high-coming HPC systems. For instance, Blue Gene/L is a 3D torus of size 64Ã?32Ã?32 [1], Kraken, a Cray XT 5, is a 3, validate, and evaluate the algorithm on Kraken, within a few weeks of development. Although we use a high

Paris-Sud XI, UniversitÃ© de

290

Grape clusters and foliage detection algorithms for autonomous selective vineyard sprayer

While much of modern agriculture is based on mass mechanized production, advances in sensing and manipulation technologies\\u000a may facilitate precision autonomous operations that could improve crop yield and quality while saving energy, reducing manpower,\\u000a and being environmentally friendly. In this paper, we focus on autonomous spraying in vineyards and present four machine vision\\u000a algorithms that facilitate selective spraying. In the

Ron Berenstein; Ohad Ben Shahar; Amir Shapiro; Yael Edan

2010-01-01

291

NASA Astrophysics Data System (ADS)

We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations. The absorption spectrum can in this formulation be seen as a matrix function of the characteristic VCC Jacobian response matrix. The asymmetric matrix version of the Lanczos method is used to generate a tridiagonal representation of the VCC response Jacobian. Solving the complex response equations in the relevant Lanczos space provides a method for calculating the VCC damped response functions and thereby subsequently the absorption spectra. The convergence behaviour of the algorithm is discussed theoretically and tested for different levels of completeness of the VCC expansion. Comparison is made with results from the recently reported [P. Seidler, M. B. Hansen, W. Györffy, D. Toffoli, and O. Christiansen, J. Chem. Phys. 132, 164105 (2010)] vibrational configuration interaction damped response function calculated using a symmetric Lanczos algorithm. Calculations of IR spectra of oxazole, cyclopropene, and uracil illustrate the usefulness of the new VCC based method.

Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter; Christiansen, Ove

2012-03-01

292

We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations. The absorption spectrum can in this formulation be seen as a matrix function of the characteristic VCC Jacobian response matrix. The asymmetric matrix version of the Lanczos method is used to generate a tridiagonal representation of the VCC response Jacobian. Solving the complex response equations in the relevant Lanczos space provides a method for calculating the VCC damped response functions and thereby subsequently the absorption spectra. The convergence behaviour of the algorithm is discussed theoretically and tested for different levels of completeness of the VCC expansion. Comparison is made with results from the recently reported [P. Seidler, M. B. Hansen, W. Gyo?rffy, D. Toffoli, and O. Christiansen, J. Chem. Phys. 132, 164105 (2010)] vibrational configuration interaction damped response function calculated using a symmetric Lanczos algorithm. Calculations of IR spectra of oxazole, cyclopropene, and uracil illustrate the usefulness of the new VCC based method. PMID:22462829

Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter; Christiansen, Ove

2012-03-28

293

NASA Astrophysics Data System (ADS)

A set of analytical and computational tools based on transition path theory (TPT) is proposed to analyze flows in complex networks. Specifically, TPT is used to study the statistical properties of the reactive trajectories by which transitions occur between specific groups of nodes on the network. Sampling tools are built upon the outputs of TPT that allow to generate these reactive trajectories directly, or even transition paths that travel from one group of nodes to the other without making any detour and carry the same probability current as the reactive trajectories. These objects permit to characterize the mechanism of the transitions, for example by quantifying the width of the tubes by which these transitions occur, the location and distribution of their dynamical bottlenecks, etc. These tools are applied to a network modeling the dynamics of the Lennard-Jones cluster with 38 atoms () and used to understand the mechanism by which this cluster rearranges itself between its two most likely states at various temperatures.

Cameron, Maria; Vanden-Eijnden, Eric

2014-08-01

294

Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.

Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina, E-mail: despina.kontos@uphs.upenn.edu [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)] [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2013-12-15

295

A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

NASA Astrophysics Data System (ADS)

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ?-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

296

With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts. PMID:25487092

Thimmaiah, Tim; Voje, William E; Carothers, James M

2015-01-01

297

Adaptive fuzzy leader clustering of complex data sets in pattern recognition

NASA Technical Reports Server (NTRS)

A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.

Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda

1992-01-01

298

A tutorial on spectral clustering

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved eciently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works

Ulrike Von Luxburg

2007-01-01

299

NASA Astrophysics Data System (ADS)

Terrestrial laser scanning is becoming a common surveying technique to measure quickly and accurately dense point clouds in 3-D. It simplifies measurement tasks on site. However, the massive volume of 3-D point measurements presents a challenge not only because of acquisition time and management of huge volumes of data, but also because of processing limitations on PCs. Raw laser scanner point clouds require a great deal of processing before final products can be derived. Thus, segmentation becomes an essential step whenever grouping of points with common attributes is required, and it is necessary for applications requiring the labelling of point clouds, surface extraction and classification into homogeneous areas. Segmentation algorithms can be classified as surface growing algorithms or clustering algorithms. This paper presents an unsupervised robust clustering approach based on fuzzy methods. Fuzzy parameters are analysed to adapt the unsupervised clustering methods to segmentation of laser scanner data. Both the Fuzzy C-Means (FCM) algorithm and the Possibilistic C-Means (PCM) mode-seeking algorithm are reviewed and used in combination with a similarity-driven cluster merging method. They constitute the kernel of the unsupervised fuzzy clustering method presented herein. It is applied to three point clouds acquired with different terrestrial laser scanners and scenarios: the first is an artificial (synthetic) data set that simulates a structure with different planar blocks; the second a composition of three metric ceramic gauge blocks (Grade 0, flatness tolerance ± 0.1 ?m) recorded with a Konica Minolta Vivid 9i optical triangulation digitizer; the last is an outdoor data set that comes up to a modern architectural building collected from the centre of an open square. The amplitude-modulated-continuous-wave (AMCW) terrestrial laser scanner system, the Faro 880, was used for the acquisition of the latter data set. Experimental analyses of the results from the proposed unsupervised planar segmentation process are shown to be promising.

Biosca, Josep Miquel; Lerma, José Luis

300

Use of solvent-mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein “hot spots”. Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages. PMID:18679808

Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.

2010-01-01

301

NASA Technical Reports Server (NTRS)

Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.

Werth, L. F. (principal investigator)

1981-01-01

302

NASA Astrophysics Data System (ADS)

In the 2D non-contacted body measurement, the transform model which converts the human body 2D girth data to the 3D girth data is required. However, the integrate model is hardly to be obtained for the different human body type categories determine the different model parameter. So, the work of human body type accuracy classification based on the measure data is very important. The canonical transformation method is used to strengthen the similar of data features of the same type and broaden the diversity of the data features of the different type. The "accumulating dead bodies" ant colony algorithm is improved in the paper in the way of employing the road information densities to help the ant to select the probable path lead to site of the accumulating dead bodies when it moves the data. By the way, the randomness and blindness of the ants' walking are eliminated, and the speed of the algorithm convergence is improved. For avoiding the unevenness of the data unit visited times in the algorithm, the access mechanism of the union data is employed, which avoid the algorithm to get into the local foul trap. The clustering validity function is selected to verify the clustering result of the human measure data. The experiment results indicate the affectivity and efficiency of the human body clustering work based on the improved ant colony algorithm. Basing the sorting result, the accuracy 3D body data transforming model can be founded, which should improve the accuracy of the non-contacted body measurement.

Zhan, Qun; Zhao, Nanxiang

2011-08-01

303

NASA Astrophysics Data System (ADS)

The novel surface mode of the Birmingham Cluster Genetic Algorithm (S-BCGA) is employed for the global optimisation of noble metal tetramers upon an MgO (100) substrate at the GGA-DFT level of theory. The effect of element identity and alloying in surface-bound neutral subnanometre clusters is determined by energetic comparison between all compositions of PdnAg(4-n) and PdnPt(4-n). While the binding strengths to the surface increase in the order Pt > Pd > Ag, the excess energy profiles suggest a preference for mixed clusters for both cases. The binding of CO is also modelled, showing that the adsorption site can be predicted solely by electrophilicity. Comparison to CO binding on a single metal atom shows a reversal of the 5?-d activation process for clusters, weakening the cluster-surface interaction on CO adsorption. Charge localisation determines homotop, CO binding and surface site preferences. The electronic behaviour, which is intermediate between molecular and metallic particles allows for tunable features in the subnanometre size range.

Heard, Christopher J.; Heiles, Sven; Vajda, Stefan; Johnston, Roy L.

2014-09-01

304

NASA Astrophysics Data System (ADS)

We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].

Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.

2010-06-01

305

, the mesh of a 2D surface of an airfoil or a 3D engine cylinder. Par- titioning such a mesh into subdomainsA Min-max Cut Algorithm for Graph Partitioning and Data Clustering Chris H.Q. Ding a , Xiaofeng He function that fol- lows the min-max clustering principle. The relaxed ver- sion of the optimization

Ding, Chris

306

\\u000a In spite of the increasing interest into clustering research within the last decades, a unified clustering theory that is\\u000a independent of a particular algorithm, or underlying the data structure and even the objective function has not be formulated\\u000a so far. In the paper at hand, we take the first steps towards a theoretical foundation of clustering, by proposing a new

Gerasimos S. Antzoulatos; Michael N. Vrahatis

307

In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996

Srivastava, Subodh; Sharma, Neeraj; Singh, S. K.; Srivastava, R.

2014-01-01

308

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

2015-01-01

309

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

2015-01-01

310

SCENARIOS OF A NUCLEAR POWER PLANT DIGITAL INSTRUMENTATION AND CONTROL SYSTEM Francesco Di Maioa of accident scenarios generated in a dynamic safety and reliability analyses of a Nuclear Power Plant (NPP, nuclear power plants. hal-00609634,version1-27Jul2012 Author manuscript, published in "IEEE Transactions

Paris-Sud XI, UniversitÃ© de

311

Distributed Clustering for Ad Hoc Networks

A Distributed Clustering Algorithm (DCA) and a Distributed Mobility-Adaptive Clustering (DMAC) algorithm are presented that partition the nodes of a fully mobilenetwork (ad hoc network) into clusters, thus giving the network a hierarchical organization.

Stefano Basagni

1999-01-01

312

INVESTIGATING DISTANCE METRICS IN SEMI-SUPERVISED FUZZY C-MEANS FOR BREAST CANCER CLASSIFICATION

their effect on classification results. Mahalanobis, Euclidean and kernel-based distance metrics were used with those of Soria et al. The superiority of Euclidean distance to Mahalanobis distance is unexpected as it can only generate spher- ical clusters while Mahalanobis distance can generate hyperellipsoidal ones

Aickelin, Uwe

313

A Simple Approach to Global Classification of Tree Plant Functional Types Using Cluster Analysis

NASA Astrophysics Data System (ADS)

Classification of vegetation species diversity into a restricted number of Plant Functional Types (PFT) simplifies modelling of large-scale (global) vegetation dynamics. One purpose of such classification is to quantify relationships between a PFT's geographic distribution and environmental factors to identify structural and functional PFTs at global scale. Here, we aimed at a simple integrated algorithm to relate climate space to global distributions of tree PFTs. Multi-variance clustering was used to analyze the statistical homogeneity of the climate space where individual tree PFTs exist. In this way, primary tree PFTs identified from the satellite-based GLC2000 classification were separated into tropical, temperate and boreal PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global datasets of monthly minimum temperature, growing degree days, an index of climatic moisture and estimated PFT cover fractions were used as variables in the cluster analysis. The sequences of statistical analysis procedures included a comparison of the K-means and Fuzzy C-means clustering algorithms, followed by cluster validation and merging. Overall, K-means produced clusters with narrower dispersions about their centers. The optimal number of clusters for K-means was determined from the Cubic Clustering Criterion, Pseudo F and R2, combined with examination of geographical distributions. Where the optimal number of clusters exceeded the number of desired PFTs, clusters were merged following a decision rule based on overlapping temperature ranges. The statistical results for individual PFT clusters were found consistent with those from other global scale PFT classifications. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping between PFT cluster boundaries that reflected vegetation transitions, e.g., between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics including carbon cycling using global vegetation models.

Wang, A.; Price, D. T.

2005-12-01

314

NASA Astrophysics Data System (ADS)

On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.

Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing

2007-04-01

315

Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246

Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David

2013-08-01

316

Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [(11)C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe major depressive disorder (MDD) underwent T1-weighted brain magnetic resonance imaging (MRI) and a 90-minute [(11)C](R)-rolipram PET scan. For each subject, metabolite-corrected input function was measured from the radial artery. SVCA templates were obtained from 10 additional healthy subjects who underwent the same MRI and PET procedures. Cluster-IDIF was obtained as follows: 1) template mask images were created for carotid and surrounding tissue; 2) parametric image of weights for blood were created using SVCA; 3) mask images to the individual PET image were inversely normalized; 4) carotid and surrounding tissue time activity curves (TACs) were obtained from weighted and unweighted averages of each voxel activity in each mask, respectively; 5) partial volume effects and radiometabolites were corrected using individual arterial data at four points. Logan-distribution volume (V T/f P) values obtained by cluster-IDIF were similar to reference results obtained using arterial data, as well as those obtained using manual-IDIF; 39 of 51 subjects had a V T/f P error of <5%, and only one had error >10%. With automatic voxel selection, cluster-IDIF curves were less noisy than manual-IDIF and free of operator-related variability. Cluster-IDIF showed widespread decrease of about 20% [(11)C](R)-rolipram binding in the MDD group. Taken together, the results suggest that cluster-IDIF is a good alternative to full arterial input function for estimating Logan-V T/f P in [(11)C](R)-rolipram PET clinical scans. This technique enables fully automated extraction of IDIF and can be applied to other radiotracers with similar kinetics. PMID:24586526

Lyoo, Chul Hyoung; Zanotti-Fregonara, Paolo; Zoghbi, Sami S; Liow, Jeih-San; Xu, Rong; Pike, Victor W; Zarate, Carlos A; Fujita, Masahiro; Innis, Robert B

2014-01-01

317

: Forecasting tropical cyclogenesis over the Atlantic basin using large-scale data. MWR, 131, 2927-2940. Knapp, K., 2008: Hurricane Satellite (HURSAT) data sets: Low-earth orbit infrared and microwave data. 28th tropical convection, or "cloud clusters", are a necessary precursor for tropical cyclogenesis. Several past

Hennon, Christopher C.

318

Energy loss (kWh) estimation of distribution systems is an important task for the system operation and planning. Because the losses are obtained through estimation, providing a fuzzy range of losses to engineers is essential. A new method based on fuzzy-c-number (FCN) and cluster-wise fuzzy regression (CWFR) analysis is proposed for developing loss formulas to estimate losses in this paper. A

Ying-Yi Hong; Zuei-Tien Chao

2002-01-01

319

Delineation of river bed-surface patches by clustering high-resolution spatial grain size data

NASA Astrophysics Data System (ADS)

The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences clustering leads to a trade-off between the quality of GSD separation between patch types and the spatial coherence of patches. Methods incorporating spatial information during the clustering process tended to produce a finite number of types of patches. As methods improve for collecting high-resolution grain size data, the approaches described here can be scaled up to field studies to better characterize the grain size heterogeneity of river beds.

Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.

2014-01-01

320

It is a high-quality algorithm for hierarchical clustering of large software source code. This effectively allows to break the complexity of tens of millions lines of source code, so that a human software engineer can comprehend a software system at high level by means of looking at its architectural diagram that is reconstructed automatically from the source code of the software system. The architectural diagram shows a tree of subsystems having OOP classes in its leaves (in the other words, a nested software decomposition). The tool reconstructs the missing (inconsistent/incomplete/inexistent) architectural documentation for a software system from its source code. This facilitates software maintenance: change requests can be performed substantially faster. Simply speaking, this unique tool allows to lift the comprehensible grain of object-oriented software systems from OOP class-level to subsystem-level. It is estimated that a commercial tool, developed on the basis of this work, will reduce software mainte...

Rogatch, Sarge

2012-01-01

321

ECSAGO: Evolutionary Clustering with Self Adaptive Genetic Operators

We present an algorithm for Evolutionary Clustering with Self Adaptive Genetic Operators (ECSAGO). This algorithm is based on the Unsupervised Niche Clustering (UNC) and Hybrid Adaptive Evolutionary (HAEA) algorithms. The UNC is a genetic clustering algorithm that is robust to noise and is able to determine the number of clusters automatically. HAEA is a parameter adaptation technique that automatically learns

Elizabeth Leon; Olfa Nasraoui; Jonatan Gomez

2006-01-01

322

NASA Astrophysics Data System (ADS)

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ? RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind

Wagstaff, Kiri L.

2012-03-01

323

NASA Technical Reports Server (NTRS)

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.

Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.

1992-01-01

324

NASA Astrophysics Data System (ADS)

Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (?-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several slices yields average accuracy, sensitivity and specificity higher than 90%. Results indicate that FCM is an appropriate technique for tissue classification in the sinusoidal area where there is air-bone interface. Furthermore, using Dixon method, fat and brain tissues were successfully separated.

Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza

2014-01-01

325

Initialization Free Graph Based Clustering

This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are

Laurent Galluccio; Olivier J. J. Michel; Pierre Comon; Eric Slezak; Alfred O. Hero

2009-01-01

326

NASA Astrophysics Data System (ADS)

The loop algorithm for the world-line quantum Monte Carlo method on quantum lattice models is presented. After introducing the path integral representation that maps a quantum model to a classical one, we describe the continuous imaginary time limit, cluster algorithm, and the rejection free scheme, which are the major improvements on the quantum Monte Carlo technique during the last decades. By means of the loop algorithm, one can simulate various unfrustrated quantum lattice models of millions of sites at extremely low temperatures with absolute accuracy, being free from the critical and fine-mesh slowing down and the Suzuki-Trotter discretization error. We also discuss some technical aspects of the algorithm such as effective implementation and parallelization.

Todo, Synge

327

NASA Astrophysics Data System (ADS)

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620

Komura, Yukihiro; Okabe, Yutaka

2014-03-01

328

The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells.In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4.By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line. PMID:25813497

Douglass, Michael; Bezak, Eva; Penfold, Scott

2015-04-21

329

A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.

It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets. PMID:24808603

Wang, Dianhui; Tapan, Sarwar

2013-10-01

330

Haplotyping Problem, A Clustering Approach

NASA Astrophysics Data System (ADS)

Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.

Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi

2007-09-01

331

Alternative Clustering Analysis: A Review James Bailey

Chapter 21 Alternative Clustering Analysis: A Review James Bailey Department of Computing Clustering Analysis using Alternative Clusterings .......... 536 21.3.1 Alternative Clustering Algorithms and Web mining. Clustering analysis provides a way to automatically identify patterns and 533 #12;534 Data

Bailey, James

332

Optimal cluster selection based on Fisher class separability measure

In this paper, a novel hierarchical clustering algorithm is proposed, where the number of clusters is optimally determined according to the Fisher class separability measure. The clustering algorithm consists of two phases: (1) Generation of sub-clusters based on the similarity metric; (2) Merging of sub-clusters based on the Fisher class separability measure. The proximity matrices are constructed. Each subcluster comprises

Xudong Wang; Vassilis L. Syrmos

2005-01-01

333

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018

Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

2014-12-01

334

Unsupervised Optimal Fuzzy Clustering

This study reports on a method for carrying out fuzzy classification without a priori assumptions on the number of clusters in the data set. Assessment of cluster validity is based on performance measures using hypervolume and density criteria. An algorithm is derived from a combination of the fuzzy K-means algorithm and fuzzy maximum-likelihood estimation. The unsupervised fuzzy partition-optimal number of

Isak Gath; Amir B. Geva

1989-01-01

335

NASA Astrophysics Data System (ADS)

Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.

Hsu, Kuo-Hsien

2012-11-01

336

Metamodel-based global optimization using fuzzy clustering for design space reduction

NASA Astrophysics Data System (ADS)

High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness and capability of global convergence and gives a new optimization strategy for engineering design optimization problems involving expensive black box models.

Li, Yulin; Liu, Li; Long, Teng; Dong, Weili

2013-09-01

337

Scalable Clustering Using Graphics Processors

We present new algorithms for scalable clustering using graph- ics processors. Our basic approach is based on k-means, but it reorders the way of determining object labels, and exploits the high computational power and pipeline of graphics processing units (GPUs). The core oper- ations in clustering algorithms, i.e., distance computing and comparison, are performed by utilizing the fragment vector processing

Feng Cao; Anthony K. H. Tung; Aoying Zhou

2006-01-01

338

Algorithms and Algorithmic Languages.

ERIC Educational Resources Information Center

This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…

Veselov, V. M.; Koprov, V. M.

339

NASA Astrophysics Data System (ADS)

Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7]4-. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.

Weigend, Florian

2014-10-01

340

Exploratory and inferential analysis of gene cluster neighborhood graphs

Abstract Background: Many dierent,cluster methods are frequently used in gene expression data analysis to find groups of co?expressed genes. However, cluster algorithms with the ability to visualize the resulting clusters are usually preferred. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Results: In

Theresa Scharl; Ingo Voglhuber; Friedrich Leisch

2009-01-01

341

City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs.

The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate mRNA expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. The members of a cluster that are in close proximity on chromosomes are highly likely to be processed as co-transcribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents a clustering algorithm in order to extract meaningful information from miRNA expression data. It judiciously integrates the merits of rough sets, fuzzy sets, the c-means algorithm, and the normalized range-normalized city block distance to discover co-expressed miRNA clusters. While the membership functions of fuzzy sets enable efficient handling of overlapping partitions in a noisy environment, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition. The city block distance is used to compute the membership functions of fuzzy sets and to find initial partition of a data set, and therefore helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated for several miRNA expression data sets using different cluster validity indices. Moreover, the gene ontology is used to analyze the functional consistency and biological significance of generated miRNA clusters. PMID:24682049

Paul, Sushmita; Maji, Pradipta

2014-06-01

342

Clustering with Normalized Cuts is Clustering with a Hyperplane

algorithm of Shi and Malik [1], originally presented as a graph-theoretic algorithm, can be interpreted clustering algorithm of Shi and Malik [1] views the data set as a graph, where nodes represent data points are unlabeled and the goal is to recover the labels. In section 7, we show how the SVM margin, typically used

Darrell, Trevor

343

A Graph-Theoretic Approach to Nonparametric Cluster Analysis

Nonparametric clustering algorithms, including mode-seeking, valley-seeking, and unimodal set algorithms, are capable of identifying generally shaped clusters of points in metric spaces. Most mode and valley-seeking algorithms, however, are iterative and the clusters obtained are dependent on the starting classification and the assumed number of clusters. In this paper, we present a noniterative, graph-theoretic approach to nonparametric cluster analysis. The

Warren L. G. Koontz; Patrenahalli M. Narendra; Keinosuke Fukunaga

1976-01-01

344

Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as ac- curacy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering al- gorithms search for one optimal clustering based on a pre- specified clustering criterion. Once that clustering has

Rich Caruana; Mohamed Farid Elhawary; Nam Nguyen; Casey Smith

2006-01-01

345

at molecular levels that occur before the change in morphology seen under the light microscope. An advantage functional groups of chemical compounds absorb infrared radiation (IR) at characteristic frequencies

Garibaldi, Jon

346

\\u000a The risk of breast cancer is increased by a number of factors including the breast density, considered to be the proportion\\u000a of the fibroglandular tissue in the breast. Breast density can be assessed in three-dimensional breast MR images. This involves\\u000a analysis of a large volume of image data. Most MR based density estimation methods are designed to work on images

G. Ertas; S. Reed; M. O. Leach

347

Static and Dynamic Information Organization with Star Clusters

Static and Dynamic Information Organization with Star Clusters Javed Aslam Katya Pelekhov Daniela on TREC data. We introduce the o#Âline and onÂline star clustering algorithms for information or and average link clustering algorithms. Since the star algorithm is also highly e#cient and simple

Aslam, Javed

348

Bayesian Decision Theoretical Framework for Clustering

ERIC Educational Resources Information Center

In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…

Chen, Mo

2011-01-01

349

CACTUS—clustering categorical data using summaries

Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received some attention. However, previous algorithms do not give a formal description of the clusters they discover

Venkatesh Ganti; Johannes Gehrket; Raghu Ramakrishnant

1999-01-01

350

Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer

Finding subtypes of heterogeneous diseases is the biggest challenge in the area of biology. Often, clustering is used to provide a hypothesis for the subtypes of a heterogeneous disease. However, there are usually discrepancies between the clusterings produced by different algorithms. This work introduces a simple method which provides the most consistent clusters across three different clustering algorithms for a

Pritha Mahata

2010-01-01

351

Time series clustering analysis of health-promoting behavior

NASA Astrophysics Data System (ADS)

Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

2013-10-01

352

Clustering and Metaclustering with Nonnegative Matrix Decompositions

Although very widely used in unsupervised data mining, most clustering methods are affected by the instability of the resulting clusters w.r.t. the initialization of the algorithm (as e.g. in k-means). Here we show that this problem can be elegantly and efficiently tackled by meta-clustering the clusters produced in several different runs of the algorithm, especially if \\

Liviu Badea

2005-01-01

353

Static and dynamic information organization with star clusters

In this paper we present a system for static and dy- namic information organization and show our evaluations of this system on TREC data. We introduce the off-line and on-line star clustering algorithms for information or- ganization. Our evaluation experiments show that the off- line star algorithm outperforms the single link and average link clustering algorithms. Since the star algorithm

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1998-01-01

354

DNA clustering and genome complexity.

Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering. PMID:25182383

Dios, Francisco; Barturen, Guillermo; Lebrón, Ricardo; Rueda, Antonio; Hackenberg, Michael; Oliver, José L

2014-12-01

355

Color segmentation using MDL clustering

NASA Astrophysics Data System (ADS)

This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.

Wallace, Richard S.; Suenaga, Yasuhito

1991-02-01

356

NASA Astrophysics Data System (ADS)

Infrared thermography has been used increasingly as an effective non-destructive technique to detect cracks on metal surface. Due to many factors, infrared thermal image has low definition compared to visible image. The contrasts between cracks and sound areas in different thermal image frames of a specimen vary greatly with the recorded time. An accurate detection can only be obtained by glancing over the whole thermal video, which is a laborious work. Moreover, experience of the operator has a great important influence on the accuracy of detection result. In this paper, an infrared thermal image processing framework based on superpixel algorithm is proposed to accomplish crack detection automatically. Two popular superpixel algorithms are compared and one of them is selected to generate superpixels in this application. Combined features of superpixels were selected from both the raw gray level image and the high-pass filtered image. Fuzzy c-means clustering is used to cluster superpixels in order to segment infrared thermal image. Experimental results show that the proposed framework can recognize cracks on metal surface through infrared thermal image automatically.

Xu, Changhang; Xie, Jing; Chen, Guoming; Huang, Weiping

2014-11-01

357

Comparing clustering and partitioning strategies

NASA Astrophysics Data System (ADS)

In this work we compare balance and edge-cut evaluation metrics to measure the performance of two wellknown graph data-grouping algorithms applied to four web and social network graphs. One of the algorithms employs a partitioning technique using Kmetis tool, and the other employs a clustering technique using Scluster tool. Because clustering algorithms use a similarity measure between each graph node and partitioning algorithms use a dissimilarity measure (weight), it was necessary to apply a normalized function to convert weighted graphs to similarity matrices.

Afonso, Carlos; Ferreira, Fábio; Exposto, José; Pereira, Ana I.

2012-09-01

358

Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering

NASA Astrophysics Data System (ADS)

In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.

Onishi, Masaki; Yoda, Ikushi

359

Swarm Intelligence in Text Document Clustering

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

2008-01-01

360

The applicability and effectiveness of cluster analysis

NASA Technical Reports Server (NTRS)

An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

Ingram, D. S.; Actkinson, A. L.

1973-01-01

361

Online Software for Clustering

NSDL National Science Digital Library

This metasite provides informal reviews and links (mainly taken from electronic mailing lists and newsgroups) to clustering software that is free on the Internet. The software is accessible by anonymous FTP, Gopher, or World Wide Web. Examples of links annotated here include LVQ_PAK for Learning Vector Quantization algorithms, Tooldiag for the analysis and visualization of sensorial data, and Fixed Point Cluster Analysis. The site is maintained by Fionn Murtagh, Associate Professor of Astronomy at Louis Pasteur University's Strasbourg Observatory, France. This site is worth browsing by scientists interested in cluster analysis techniques for a variety of disciplines.

Murtagh, Fionn.

362

Speaker Clustering in Speech Recognition

The paper presents a combination of speaker and speech recognition techniques aiming to improve speech recognition rates. This combination is done by clustering the speaker models created from the training material. Speaker model is a codebook obtained by Vector Quantization (VQ) approach. We propose metaclustering algorithm to group codebooks into clusters and calculate the centroid codebooks. The last are thought

Olga Grebenskaya; Tomi Kinnunen; Pasi Fränti

2005-01-01

363

A framework for clustering evolving data streams

The clustering problem is a dicult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inef- cien t. In recent years, a few one-pass clus- tering algorithms have been developed for the data stream problem. Although such methods address the scalability issues of the clustering problem,

Charu C. Aggarwal; Jiawei Han; Jianyong Wang; Philip S. Yu

2003-01-01

364

Clustering Web Search Results Using Fuzzy Ants

Clustering Web Search Results Using Fuzzy Ants Steven Schockaert,* Martine De Cock, Chris Cornelis and Uncertainty Modelling Research Unit, Krijgslaan 281 (S9), B-9000 Gent, Belgium Algorithms for clustering Web existing approaches and illustrates how our algorithm can be applied to the problem of Web search results

Gent, Universiteit

365

Gene Expression Data Knowledge Discovery using Global and Local Clustering

To understand complex biological systems, the research community has produced huge corpus of gene expression data. A large number of clustering approaches have been proposed for the analysis of gene expression data. However, extracting important biological knowledge is still harder. To address this task, clustering techniques are used. In this paper, hybrid Hierarchical k-Means algorithm is used for clustering and biclustering gene expression data is used. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. A validation technique, Figure of Merit is used to determine the quality of clustering results. Appropriate knowledge is mined from the clusters by embedding a BLAST similarity search program into the clustering and biclustering process. To discover both local and global clustering structure biclustering and clustering algorithms are utilized. To determine the quality of clustering results, a validation technique, Figure of Merit is used. Appropriate ...

H, Swathi

2010-01-01

366

A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7

We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

Hao, Jiangang; /Fermilab; McKay, Timothy A.; /Michigan U.; Koester, Benjamin P.; /Chicago U.; Rykoff, Eli S.; /UC, Santa Barbara /LBL, Berkeley; Rozo, Eduardo; /Chicago U.; Annis, James; /Fermilab; Wechsler, Risa H.; /SLAC; Evrard, August; /Michigan U.; Siegel, Seth R.; /Michigan U.; Becker, Matthew; /Chicago U.; Busha, Michael; /SLAC; Gerdes, David; /Michigan U.; Johnston, David E.; /Fermilab; Sheldon, Erin; /Brookhaven

2011-08-22

367

Hierarchical clustering in minimum spanning trees.

The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network. PMID:25725643

Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan

2015-02-01

368

Hierarchical clustering in minimum spanning trees

NASA Astrophysics Data System (ADS)

The identification of clusters or communities in complex networks is a reappearing problem. The minimum spanning tree (MST), the tree connecting all nodes with minimum total weight, is regarded as an important transport backbone of the original weighted graph. We hypothesize that the clustering of the MST reveals insight in the hierarchical structure of weighted graphs. However, existing theories and algorithms have difficulties to define and identify clusters in trees. Here, we first define clustering in trees and then propose a tree agglomerative hierarchical clustering (TAHC) method for the detection of clusters in MSTs. We then demonstrate that the TAHC method can detect clusters in artificial trees, and also in MSTs of weighted social networks, for which the clusters are in agreement with the previously reported clusters of the original weighted networks. Our results therefore not only indicate that clusters can be found in MSTs, but also that the MSTs contain information about the underlying clusters of the original weighted network.

Yu, Meichen; Hillebrand, Arjan; Tewarie, Prejaas; Meier, Jil; van Dijk, Bob; Van Mieghem, Piet; Stam, Cornelis Jan

2015-02-01

369

Astrophysical parameters of Galactic open clusters

We present a catalogue of astrophysical data for 520 Galactic open clusters. These are the clusters for which at least three most probable members (18 on average) could be identified in the ASCC-2.5, a catalogue of stars based on the Tycho-2 observations from the Hipparcos mission. We applied homogeneous methods and algorithms to determine angular sizes of cluster cores and

N. V. Kharchenko; A. E. Piskunov; S. Röser; E. Schilbach; R.-D. Scholz

2005-01-01

370

An Evolutionary Approach to Multiobjective Clustering

The framework of multiobjective optimization is used to tackle the unsupervised learning problem, data clustering, following a formulation first proposed in the statistics literature. The conceptual advantages of the multiobjective formulation are discussed and an evolutionary approach to the problem is developed. The resulting algorithm, multiobjective clustering with automatic k-determination, is compared with a number of well-established single-objective clustering algorithms,

Julia Handl; Joshua D. Knowles

2007-01-01

371

Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method. Development of these clustering algorithms was based on examples from two-dimensional space because we wanted to copy the human perception of

CHARLES T. ZAHN

1971-01-01

372

Grid-based DBSCAN Algorithm with Referential Parameters

NASA Astrophysics Data System (ADS)

A new algorithm GRPDBSCAN (Grid-based DBSCAN Algorithm with Referential Parameters) is proposed in this paper. GRPDBSCAN, which combined the grid partition technique and multi-density based clustering algorithm, has improved its efficiency. On the other hand, because the Eps and Minpts parameters of the DBSCAN algorithm were auto-generated, so they were more objective. Experimental results shown that the new algorithm not only can better differentiate between noises and discovery clusters of arbitrary shapes but also have more robust.

Darong, Huang; Peng, Wang

373

Star clusters are observed in almost every galaxy. In this thesis we address several fundamental problems concerning the formation, evolution and disruption of star clusters. From observations of (young) star clusters in the interacting galaxy M51, we found that clusters are formed in complexes of stars and star clusters. These complexes share similar properties with giant molecular clouds, from which

M. Gieles

2006-01-01

374

When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952

Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond

2014-01-01

375

Toward Parallel Document Clustering

A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.

Mogill, Jace A.; Haglin, David J.

2011-09-01

376

Object Based Image Segmentation Using Fuzzy Clustering

Existing shape-based clustering algorithms, including fuzzy k-rings, fuzzy k-elliptical, circular c-shell, and fuzzy c-shell ellipsoidal are all designed to segment regular geometrically shaped objects such as circles, ellipses or combination of both. These algorithms however, are unsuitable for segmenting arbitrary-shaped objects, so in an attempt to address this issue, a fuzzy image segmentation of generic shaped clusters (FISG) algorithm was

M. Ameer Ali; Laurence S Dooley; Gour C Karmakar

2006-01-01

377

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

Bair, Eric

2013-01-01

378

Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442

Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan

2014-01-01

379

Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.

Muetterties, Earl L.

1980-05-01

380

Determination of the volumes of acute cerebral infarct in the magnetic resonance imaging harbors prognostic values. However, semiautomatic method of segmentation is time-consuming and with high interrater variability. Using diffusion weighted imaging and apparent diffusion coefficient map from patients with acute infarction in 10 days, we aimed to develop a fully automatic algorithm to measure infarct volume. It includes an unsupervised classification with fuzzy C-means clustering determination of the histographic distribution, defining self-adjusted intensity thresholds. The proposed method attained high agreement with the semiautomatic method, with similarity index 89.9 ± 6.5%, in detecting cerebral infarct lesions from 22 acute stroke patients. We demonstrated the accuracy of the proposed computer-assisted prompt segmentation method, which appeared promising to replace the laborious, time-consuming, and operator-dependent semiautomatic segmentation. PMID:24738080

Tsai, Jang-Zern; Chen, Yu-Wei; Wang, Kuo-Wei; Wu, Hsiao-Kuang; Lin, Yun-Yu; Lee, Ying-Ying; Chen, Chi-Jen; Lin, Huey-Juan; Smith, Eric Edward; Hsin, Yue-Loong

2014-01-01

381

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the on-line version of the star algorithm [JPR98, JPR99] as the clustering tool algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do

Aslam, Javed

382

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the onÂline version of the star algorithm [JPR98, JPR99] as the clustering tool#cient algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do

Aslam, Javed

383

Clustering Genes Using Gene Expression and Text Literature Data

Clustering of gene expression data is a stan- dard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investi- gate the problem of improving the clustering by integrating information obtained from gene ex- pression data

Chengyong Yang; Erliang Zeng; Tao Li; Giri Narasimhan

2005-01-01

384

Automatic subspace clustering of high dimensional data for data mining applications

Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces

Rakesh Agrawal; Johannes E. Gehrke; Dimitrios Gunopulos; Prabhakar Raghavan

1998-01-01

385

A Knowledge-Driven Method to Evaluate Multi-source Clustering

Traditional exploratory analysis of gene expression data involves the application of clustering algorithms to obtain clusters of related genes. Recent research has focused on improving such analyses using additional biological information. It has been demonstrated that biological literature can complement the information extracted from gene expression data to obtain better gene clusters. The Multi-Source Clustering (MSC) algorithm, which was recently

Chengyong Yang; Erliang Zeng; Tao Li; Giri Narasimhan

2005-01-01

386

NASA Astrophysics Data System (ADS)

In typical case 2 waters an accurate remote sensing retrieval of chlorophyll a (chla) is still challenging. There is a widespread understanding that universally applicable water constituent retrieval algorithms are currently not feasible, shifting the research focus to regionally specific implementations of powerful inversion methods. This study takes advantage of regionally specific chlorophyll a (chla) algorithms, which were developed by the authors of this abstract in previous works, and the characteristics of Medium Resolution Imaging Spectrometer (MERIS) in order to study harmful algal events in the optically complex waters of the Galician Rias (NW). Harmful algal events are a frequent phenomenon in this area with direct and indirect impacts to the mussel production that constitute a very important economic activity for the local community. More than 240 106 kg of mussel per year are produced in these highly primary productive upwelling systems. A MERIS archive from nine years (2003-2012) was analysed using regionally specific chla algorithms. The latter were developed based on Multilayer perceptron (MLP) artificial neural networks and fuzzy c-mean clustering techniques (FCM). FCM specifies zones (based on water leaving reflectances) where the retrieval algorithms normally provide more reliable results. Monthly chla anomalies and other statistics were calculated for the nine years MERIS archive. These results were then related to upwelling indices and other associated measurements to determine the driver forces for specific phytoplankton blooms. The distribution and changes of chla are also discussed.

Gonzalez Vilas, L.; Castro Fernandez, M.; Spyrakos, E.; Torres Palenzuela, J.

2013-08-01

387

We introduce cluster superalgebras, a class of ${\\mathbb Z}_2$-graded commutative algebras generalizing cluster algebras of Fomin and Zelevinsky. These algebras contain odd coordinates that anticommute with each other and square to zero. A cluster superalgebra is defined with the help of a quiver satisfying some conditions and specific transformations called mutations. Generators of a cluster superalgebra are Laurent polynomials with denominators given by even monomials. Both, mutations and exchange relations, generalize the classical ones. Every cluster superalgebra admits a presymplectic form invariant under mutations. Our main series of examples of cluster superalgebras is provided by superfriezes~arXiv:1501.07476, analogous to Coxeter's frieze patterns.

Valentin Ovsienko

2015-03-24

388

Clustering of financial time series

NASA Astrophysics Data System (ADS)

This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.

D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo

2013-05-01

389

This article surveys the state of the art in quantum computer algorithms, including both black-box and non-black-box results. It is infeasible to detail all the known quantum algorithms, so a representative sample is given. This includes a summary of the early quantum algorithms, a description of the Abelian Hidden Subgroup algorithms (including Shor's factoring and discrete logarithm algorithms), quantum searching and amplitude amplification, quantum algorithms for simulating quantum mechanical systems, several non-trivial generalizations of the Abelian Hidden Subgroup Problem (and related techniques), the quantum walk paradigm for quantum algorithms, the paradigm of adiabatic algorithms, a family of ``topological'' algorithms, and algorithms for quantum tasks which cannot be done by a classical computer, followed by a discussion.

Michele Mosca

2008-08-04

390

Nonlinear analysis of EAS clusters

We apply certain methods of nonlinear time series analysis to the extensive\\u000aair shower clusters found earlier in the data set obtained with the EAS-1000\\u000aPrototype array. In particular, we use the Grassberger-Procaccia algorithm to\\u000acompute the correlation dimension of samples in the vicinity of the clusters.\\u000aThe validity of the results is checked by surrogate data tests and some

M. Yu. Zotov; G. V. Kulikov; Yu. A. Fomin

2002-01-01

391

A Scalable Framework For Cluster Ensembles *

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups. PMID:20160846

Hore, Prodip; Hall, Lawrence O.; Goldgof, Dmitry B.

2009-01-01

392

Cluster merging based on weighted mahalanobis distance with application in digital mammograph

A new clustering algorithm that uses a weighted Mahdlanobis distance as a distance metric to perform partitional clustering is proposed. The covariance matrices of the generated clusters are used to determine cluster similarity and closeness so that clusters which are similar in shape and close in Mahalanobis distance can be merged together serving the ultimate goal of automatically determining the

K. Younis; M. Karim; R. Hardie; J. Loomis; S. Rogers; M. DeSimio

1998-01-01

393

Gene expression data clustering using a multiobjective symmetry based clustering technique.

The invention of microarrays has rapidly changed the state of biological and biomedical research. Clustering algorithms play an important role in clustering microarray data sets where identifying groups of co-expressed genes are a very difficult task. Here we have posed the problem of clustering the microarray data as a multiobjective clustering problem. A new symmetry based fuzzy clustering technique is developed to solve this problem. The effectiveness of the proposed technique is demonstrated on five publicly available benchmark data sets. Results are compared with some widely used microarray clustering techniques. Statistical and biological significance tests have also been carried out. PMID:24209942

Saha, Sriparna; Ekbal, Asif; Gupta, Kshitija; Bandyopadhyay, Sanghamitra

2013-11-01

394

SMART: Unique Splitting-While-Merging Framework for Gene Clustering

Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159

Fa, Rui; Roberts, David J.; Nandi, Asoke K.

2014-01-01

395

To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466

Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang

2015-01-01

396

To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466

Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang

2015-01-01

397

Feature Clustering for Accelerating Parallel Coordinate Descent

We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

2012-12-06

398

On evaluating clustering procedures for use in classification

NASA Technical Reports Server (NTRS)

The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.

Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (principal investigators)

1979-01-01

399

A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

400

A new approach to effective circuit clustering

It is pointed out that the complexity of next-generation VLSI systems will exceed the capabilities of top-down layout synthesis algorithms, particularly in netlist partitioning and module placement. Bottom-up clustering is needed to condense the netlist so that the problem size becomes tractable to existing optimization methods. Here, the DS quality measure, a general metric for evaluation of clustering algorithms, is

Lars Hagen; A. B. Kahng

1992-01-01

401

Analyzing geographic clustered response

In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.

Merrill, D.W.; Selvin, S.; Mohr, M.S.

1991-08-01

402

The Sloan Nearby Cluster Weak Lensing Survey

We describe and present initial results of a weak lensing survey of nearby (z {approx}< 0.1) galaxy clusters in the Sloan Digital Sky Survey (SDSS). In this first study, galaxy clusters are selected from the SDSS spectroscopic galaxy cluster catalogs of Miller et al. and Berlind et al. We report a total of seven individual low-redshift cluster weak lensing measurements that include A2048, A1767, A2244, A1066, A2199, and two clusters specifically identified with the C4 algorithm. Our program of weak lensing of nearby galaxy clusters in the SDSS will eventually reach {approx}200 clusters, making it the largest weak lensing survey of individual galaxy clusters to date.

Kubo, Jeffrey M.; /Fermilab; Annis, James T.; /Fermilab; Hardin, Frances Mei; /Illinois Math. Sci. Acad.; Kubik, Donna; /Fermilab; Lawhorn, Kelsey; /Illinois Math. Sci. Acad.; Lin, Huan; /Fermilab; Nicklaus, Liana; /Illinois Math. Sci. Acad.; Nelson, Dylan; /UC, Berkeley; Reis, Ribamar Rondon de Rezende; /Fermilab; Seo, Hee-Jong; /Fermilab; Soares-Santos, Marcelle; /Fermilab /Inst. Geo. Astron., Havana /Sao Paulo U. /Fermilab

2009-08-01

403

NASA Astrophysics Data System (ADS)

Star clusters are observed in almost every galaxy. In this thesis we address several fundamental problems concerning the formation, evolution and disruption of star clusters. From observations of (young) star clusters in the interacting galaxy M51, we found that clusters are formed in complexes of stars and star clusters. These complexes share similar properties with giant molecular clouds, from which they are formed. Many (70%) of the young clusters will not survive the fist 10 Myr, due to the removal of left over gas. We study the evolution of clusters that have survived this first 10 Myr, to become bound star clusters that have cleared their primordial gas content. We determined the life time of such star clusters in M51 and the solar neighbourhood and compare these values, including existing values from literature, to the results of N-body simulations. These simulations consider realistic star clusters, with a stellar initial mass function, stellar evolution, accurate treatments of binaries and the tidal field of the host galaxy. We found that the observed disruption times of clusters in the solar neighbourhood and M51 are shorter than predicted by the simulations by a factor of 5 and 10, respectively. We studied the effect of additional perturbations by spiral arm crossings and encounters with giant molecular clouds with N-body simulations. We found that the mass loss due to these external perturbations, combined with the mass loss due to stellar evolution and the galactic tidal field can explain the observed disruption times. The star clusters in the solar neighbourhood have much lower masses than the young clusters observed in merging and interacting galaxies. We show that this can be largely explained by size-of-sample effects, that is, when more star clusters are observed, the chance of finding a more massive one is higher. However, we showed that there can exist a physical maximum to the cluster mass, which should be observable in the cluster luminosity function. We found this observational signature in the luminosity function of clusters in M51. A comparison to a cluster population model, that was developed for this thesis research, suggests that the maximum cluster mass in M51 is 5x10^5 solar masses. In the merging Antennae galaxies a similar luminosity function was observed. However, the maximum mass is four times higher there, suggesting that the maximum mass depends on galactic environment.

Gieles, M.

2006-10-01

404

Minimum Spanning Tree Partitioning Algorithm for Microaggregation

This paper presents a clustering algorithm for partitioning a minimum spanning tree with a constraint on minimum group size. The problem is motivated by microaggregation, a disclosure limitation technique in which similar records are aggregated into groups containing a minimum of k records. Heuristic clustering methods are needed since the minimum information loss microaggregation problem is NP-hard. Our MST partitioning

Michael Laszlo; Sumitra Mukherjee

2005-01-01

405

Algorithms for color image edge enhancement using potential functions

This letter deals with the color image edge enhancement issue using clustering ideas and based on the use of potential functions (Parzen windows). Two algorithms are proposed. The first uses potential functions (PF's) and selects the output as the vector maximizing the PF. The second one elaborates further by employing the mountain clustering method and modifying it appropriately. Both algorithms

D. Sindoukas; N. Laskaris; S. Fotopoulos

1997-01-01

406

A new approach for evolving clusters

The identification of clusters in data is important to many disciplines including artificial intelligence (knowledge acquisition), data mining, and pattern recognition. In this paper, we present a unique approach for using genetic algorithms (GAS) to perform supervised clustering. This technique is discussed in the context of the Genetic Rule and Classifier Construction Environment (GHaCCE), a data mining tool. While primarily

Robert E. Marmelstein; Gary B. Lamont

1999-01-01

407

Approximate Graph Matching and Computing Median Graph for Graph Clustering

We propose in this paper a new algorithm for computing the median of a set of graphs. The median graph is a useful tool for the clustering problem. The concept of median allows the extension of conventional algorithms such as the k-means to graph clustering, helping to bridge the gap between statistical and structural approaches to pattern recognition. An experimental

Adel Hlaoui; Shengrui Wang

408

Clustering Binary Data in the Presence of Masking Variables

ERIC Educational Resources Information Center

A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the…

Brusco, Michael J.

2004-01-01

409

Image Segmentation Using Higher-Order Correlation Clustering

in the framework. Correlation clustering (CC), which is a graph-partitioning algorithm, was recently shown segmentation. It derives its partitioning result from a pairwise graph by optimizing a global objective of segmentations in an image. Correlation clustering (CC) is a graph-partitioning algorithm [13] that

Kohli, Pushmeet

410

Clustering PPI data by combining FA and SHC method

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632

2015-01-01

411

An effective particle swarm optimization method for data clustering

Data clustering analysis is generally applied to image processing, customer relationship management and product family construction. This paper applied particle swarm optimization (PSO) algorithm on data clustering problems. Two reflex schemes are implemented on PSO algorithm to improve the efficiency. The proposed methods were tested on seven datasets, and their performance is compared with those of PSO, K-means and two

I. W. Kao; C. Y. Tsai; Y. C. Wang

2007-01-01

412

Clustering PPI data by combining FA and SHC method.

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632

Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin

2015-01-01

413

ERIC Educational Resources Information Center

This volume contains a series of papers on algorithmic learning. Included are six reviews of research pertaining to various aspects of algorithmic learning, six reports of pilot experiments in this area, a theoretical discussion of "The Conditions for Algorithmic Imagination," and an annotated bibliography. All the papers assume a common…

Suydam, Marilyn N., Ed.; Osborne, Alan R., Ed.

414

NSDL National Science Digital Library

CSC 325. (MAT 325) Numerical Algorithms (3) Prerequisite: CSC 112 or 121, MAT 162. An introduction to the numerical algorithms fundamental to scientific computer work. Includes elementary discussion of error, polynomial interpolation, quadrature, linear systems of equations, solution of nonlinear equations and numerical solution of ordinary differential equations. The algorithmic approach and the efficient use of the computer are emphasized.

Dr Gene Tagliarini

415

Parallel K-Means Clustering Based on MapReduce

NASA Astrophysics Data System (ADS)

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel k-means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

Zhao, Weizhong; Ma, Huifang; He, Qing

416

A Study on Prognosis of Brain Tumors Using Fuzzy Logic and Genetic Algorithm Based Techniques

In present study attempt has been taken to determine the degree of malignancy of brain tumors using artificial intelligence. The suspicious regions in brain as suggested by the radiologists have been segmented using fuzzy c-means clustering technique. Fourier descriptors are utilized for precise extraction of boundary features of the tumor region. As Fourier descriptors introduce a large number of feature

Arpita Das; Mahua Bhattacharya

2009-01-01

417

New algorithm to test percolation conditions within the Newman-Ziff algorithm

NASA Astrophysics Data System (ADS)

A new algorithm to test percolation conditions for the solution of percolation problems on a lattice and continuum percolation for spaces of an arbitrary dimension has been proposed within the Newman-Ziff algorithm. The algorithm is based on the use of bitwise operators and does not reduce the efficiency of the operation of the Newman-Ziff algorithm as a whole. This algorithm makes it possible to verify the existence of both clusters touching boundaries at an arbitrary point and single-loop clusters continuously connecting the opposite boundaries in a percolating system with periodic boundary conditions. The existence of a cluster touching the boundaries of the system at an arbitrary point for each direction, the formation of a one-loop cluster, and the formation of a cluster with an arbitrary number of loops on a torus can be identified in one calculation by combining the proposed algorithm with the known approaches for the identification of the existence of a percolation cluster. The operation time of the proposed algorithm is linear in the number of objects in the system.

Tronin, I. V.

2014-05-01

418

The hierarchical algorithms--theory and applications

NASA Astrophysics Data System (ADS)

Monte Carlo simulations are one of the most important numerical techniques for investigating statistical physical systems. Among these systems, spin models are a typical example which also play an essential role in constructing the abstract mechanism for various complex systems. Unfortunately, traditional Monte Carlo algorithms are afflicted with "critical slowing down" near continuous phase transitions and the efficiency of the Monte Carlo simulation goes to zero as the size of the lattice is increased. To combat critical slowing down, a very different type of collective-mode algorithm, in contrast to the traditional single-spin-flipmode, was proposed by Swendsen and Wang in 1987 for Potts spin models. Since then, there has been an explosion of work attempting to understand, improve, or generalize it. In these so-called "cluster" algorithms, clusters of spin are regarded as one template and are updated at each step of the Monte Carlo procedure. In implementing these algorithms the cluster labeling is a major time-consuming bottleneck and is also isomorphic to the problem of computing connected components of an undirected graph seen in other application areas, such as pattern recognition.A number of cluster labeling algorithms for sequential computers have long existed. However, the dynamic irregular nature of clusters complicates the task of finding good parallel algorithms and this is particularly true on SIMD (single-instruction-multiple-data machines. Our design of the Hierarchical Cluster Labeling Algorithm aims at alleviating this problem by building a hierarchical structure on the problem domain and by incorporating local and nonlocal communication schemes. We present an estimate for the computational complexity of cluster labeling and prove the key features of this algorithm (such as lower computational complexity, data locality, and easy implementation) compared with the methods formerly known. In particular, this algorithm can be viewed as a generalized scan scheme applicable to problem domains of any high dimension and of arbitrary geometry (scan is an important primitive of parallel computing). In addition, from implementation results, the hierarchical cluster labeling algorithm has proved to work equally well on MIMD machines, though originally designed for SIMD machines.Based on this success, we further study the hierarchical structure hidden in the algorithm. Hierarchical structure is a conceptual framework frequently used in building models for the study of a great variety of problems. This structure serves not only to describe the complexity of the system at different levels, but also to achieve some goals targeted by the problem, i.e., an algorithm to solve the problem. In this regard, we investigate the similarities and differences between this algorithm and others, including the FFT and the Barnes-Hut method, in terms of their hierarchical structures.

Su, Zheng-Yao

419

NASA Astrophysics Data System (ADS)

In this study we tested for groups of flares (flare clusters) in which successive flares occur within a fixed time - the linking window. The data set used is the flare waiting times provided by the X-ray flare detectors on the Geostationary Operational Environmental Satellites (GOES). The study was limited to flares of magnitude C5 and greater obtained during cycle 23. While many flares in a cluster may come from the same active region, the larger clusters often have origins in multiple regions. The longest cluster of the last cycle lasted more than 42 days with an average time separation between successive flares of 5 hours, where no two flares were separated by more than 36 hours. The flare rate in clusters is 4 to 6 time greater than the rate in solar maximum outside of flares. The are indications that flare clustering is associated with periods of multiple sunspot nests, but they are much rarer.

Title, Alan M.

2015-04-01

420

Temporal event clustering for digital photo collections

We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization.

Matthew L. Cooper; Jonathan Foote; Andreas Girgensohn; Lynn Wilcox

2003-01-01

421

Cluster Analysis: An Application of Lagrangian Relaxation

This paper presents and tests an effective optimization algorithm for clustering homogeneous data. The algorithm iteratively employs a subgradient method for determining lower bounds and a simple search procedure for determining upper bounds. The overall objective is to assign n objects to m mutually exclusive \\

John M. Mulvey; Harlan P. Crowder

1979-01-01

422

Clustering and dimensionality reduction on Riemannian manifolds

We propose a novel algorithm for clustering data sampled from multiple submanifolds of a Riemannian manifold. First, we learn a representation of the data using generalizations of local nonlinear dimensionality reduction algorithms from Eu- clidean to Riemannian spaces. Such generalizations exploit geometric properties of the Riemannian space, particularly its Riemannian metric. Then, assuming that the data points from different groups

Alvina Goh; René Vidal

2008-01-01

423

Misty Mountain clustering: application to fast unsupervised flow cytometry gating

Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data. PMID:20932336

2010-01-01

424

NSDL National Science Digital Library

Content prepared for the Supercomputing 2002 session on "Using Clustering Technologies in the Classroom". Contains a series of exercises for teaching parallel computing concepts through kinesthetic activities.

Paul Gray

425

\\u000a Industrial clusters, specially export-oriented clusters are rather new and emerging strategies for companies and countries\\u000a to achieve export development throughout the world. According to Porter (1998) in his well known paper, clusters and the new\\u000a economics of competition, “paradoxically, the enduring competitive advantages in a global economy lie increasingly in local\\u000a things – knowledge, relationships, and motivation that distant rivals

Seyed Vahid Moosavi; Mahdi Noorizadegan

426

Clustering Algorithm for Mutually Constraining Heterogeneous Features

Jet Propulsion Laboratory Jet Propulsion Laboratory California Institute of Technology California Institute of Technology Pasadena, CA 91109-8099 Pasadena, CA 91109-8099 wolfgang.fink@.jpl.nasa.gov rebecca.castano@jpl.nasa.gov Ashley Davies Eric Mjolsness Jet Propulsion Laboratory Jet Propulsion Laboratory California Institute

427

CLUSTER AUTOMORPHISMS AND COMPATIBILITY OF CLUSTER VARIABLES

CLUSTER AUTOMORPHISMS AND COMPATIBILITY OF CLUSTER VARIABLES IBRAHIM ASSEM, RALF SCHIFFLER a notion of unistructural cluster alge- bras, for which the set of cluster variables uniquely determines the clusters. We prove that cluster algebras of Dynkin type and cluster algebras of rank 2 are unistructural

Paris-Sud XI, UniversitÃ© de

428

Photometry of Standard Stars and Open Star Clusters

NASA Astrophysics Data System (ADS)

Photometric CCD observations of open star clusters and standard stars were carried out at the McDonald Observatory in Fort Davis, Texas. This data was analyzed using aperture photometry algorithms (DAOPHOT II and ALLSTAR) and the IRAF software package. Color-magnitude diagrams of these clusters were produced, showing the evolution of each cluster along the main sequence.

Jefferies, Amanda; Frinchaboy, Peter

2010-10-01

429

Comparison of Similarity Measures in Cluster Analysis with Binary Data.

ERIC Educational Resources Information Center

One set of approaches to the problem of clustering with dichotomous data in cluster analysis (CA) was studied. The techniques developed for clustering with binary data involve calculating distances between observations based on the variables and then applying one of the standard CA algorithms to these distances. One of the groups of distances that…

Finch, Holmes; Huynh, Huynh

430

A framework for ontology-driven subspace clustering

Traditional clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes. While domain knowledge is always the best way to justify clustering, few clustering algorithms have ever take domain knowledge into consideration. In this paper, the domain knowledge is represented by hierarchical ontology. We develop a framework by directly incorporating

Jinze Liu; Wei Wang; Jiong Yang

2004-01-01

431

Structure-Based Statistical Features and Multivariate Time Series Clustering

We propose a new method for clustering multivariate time series. A univariate time series can be represented by a fixed-length vector whose components are statistical features of the time series, capturing the global structure. These descriptive vectors, one for each component of the multivariate time series, are concatenated, before being clustered using a standard fast clustering algorithm such as k-means

Xiaozhe Wang; Anthony Wirth; Liang Wang

2007-01-01

432

Characteristic-Based Clustering for Time Series Data

With the growing importance of time series clustering research, particularly for similarity searches amongst long time series such as those arising in medicine or finance, it is critical for us to find a way to resolve the outstanding problems that make most clustering methods impractical under certain circumstances. When the time series is very long, some clustering algorithms may fail

Xiaozhe Wang; Kate A. Smith; Rob J. Hyndman

2006-01-01

433

Coupled two-way clustering analysis of gene microarray data

We present a coupled two-way clustering approach to gene mi- croarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering,

G. Get; Erel Levine; E. Doman

2000-01-01

434

An empirical study on Principal Component Analysis for clustering

An empirical study on Principal Component Analysis for clustering gene expression data Ka Yee Yeung Analysis for clustering gene expression data Ka Yee Yeung, Walter L. Ruzzo Dept of Computer Science data analysis techniques and different clustering algorithms to analyze the same data set can lead

Borenstein, Elhanan

435

Detection and Analysis of Galaxy Clusters in SDSS Jordan Levy

Detection and Analysis of Galaxy Clusters in SDSS Jordan Levy Dept. of Physics & Astronomy distribution of galaxies throughout the universe, showing that galaxies often form systems of clusters whose of galaxies, we detect galaxy clusters using a custom implementation of the HOP algorithm. Mass functions

Leka, K. D .

436

Analysis of Agglomerative Clustering Marcel R. Ackermann1

Analysis of Agglomerative Clustering Marcel R. Ackermann1 , Johannes Blömer1 , Daniel Kuntze1 by this algorithm is an O(log k)-approximation to the diameter k-clustering problem. Moreover, our analysis does for example [8, 11]). Later, biological taxonomy became one of the driving forces of cluster analysis. In [14

437

Multi-feature object trajectory clustering for video analysis

1 Multi-feature object trajectory clustering for video analysis Nadeem Anjum and Andrea Cavallaro], [2]). Clustering is a key component of trajectory analysis when instead of modeling and analyzing analysis algorithms use only one feature space for clustering ([6], [7], [8], [9], [10], [11]). Even when

Cavallaro, Andrea

438

Stereotyping: improving particle swarm performance with cluster analysis

Individuals in the particle swarm population were “stereotyped” by cluster analysis of their previous best positions. The cluster centers then were substituted for the individuals' and neighbors' best previous positions in the algorithm. The experiments, which were inspired by the social-psychological metaphor of social stereotyping, found that performance could be generally improved by substituting individuals', but not neighbors', cluster centers

James Kennedy

2000-01-01

439

Analysis of Agglomerative Clustering Marcel R. Ackermann1

Analysis of Agglomerative Clustering Marcel R. Ackermann1 , Johannes BlÃ¶mer1 , Daniel Kuntze1 by this algorithm is an O(log k)-approximation to the diameter k-clustering problem. Moreover, our analysis does of the driving forces of cluster analysis. In [14] the authors, who where the first biologists using computers

Paris-Sud XI, UniversitÃ© de

440

DEDICATED FILTER FOR DEFECTS CLUSTERING IN RADIOGRAPHIC IMAGE

Defect clusters such as linear or clustered porosity are in some cases even more important than single flaws. This paper presents two methods of defect clustering and algorithm for calculation of distances between flaws in digital radiographic image. Dedicated lookup table based filter is used for calculation of distances between objects in the specified range. For defect clustering two functions were developed. First one is based on MMD (Minimum Mean Distance) algorithm. Second one uses hierarchical procedures for clustering defects of various types, shapes and size.

Sikora, R.; Swiadek, K.; Chady, T. [Szczecin University of Technology, Department of Electrical Engineering, 70-313 Szczecin (Poland)

2009-03-03

441

OPTICS: Ordering Points To Identify the Clustering Structure

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are

Mihael Ankerst; Markus M. Breunig; Hans-Peter Kriegel; Jörg Sander

1999-01-01

442

Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques

ERIC Educational Resources Information Center

This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…

Luan, Jing

2004-01-01

443

Recent Advances in Conceptual Clustering: CLUSTER3

Conceptual clustering is a form of unsupervised learning that seeks clusters in data that represent simple and understandable\\u000a concepts, rather than groupings of entities with high intra-cluster and low inter-cluster similarity, as conventional clustering.\\u000a Another difference from conventional clustering is that conceptual clustering produces not only clusters but also their generalized\\u000a descriptions, and that the descriptions are used for cluster

Ryszard S. Michalski; William D. Seeman

444

Spatial cluster detection using dynamic programming

Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

2012-01-01

445

Recursive hybrid algorithm for non-linear system identification using radial basis function networks

Recursive identification of non-linear systems is investigated using radial basis function networks. A novel approach is adopted which employs a hybrid clustering and least squares algorithm. The recursive clustering algorithm adjusts the centres of the radial basis function network while the recursive least squares algorithm estimates the connection weights of the network. Because these two recursive learning rules are both

S. CHEN; S. A. BILLINGS; P. M. GRANT

1992-01-01

446

Improving Sensor Network Lifetime Through Hierarchical Multihop Clustering

In this project, we developed an adaptive multihop clustering algorithm MaxLife for sensor networks. MaxLife significantly improves sensor network lifetime by balancing energy dissipation and minimizing energy consumption at the same time. The algorithm is compared to Random and MinEnergy algorithms and shows great performance gain. Random is extended from its original design of single hop clustering in (Wendi Rabiner

Maggie X. Cheng; Xuan Gong; Scott C.-H. Huang

2009-01-01

447

Large-scale metagenomic sequence clustering on map-reduce clusters.

Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. In this paper, we present a parallel algorithm for taxonomic clustering of large metagenomic samples with support for overlapping clusters. We develop sketching techniques, akin to those created for web document clustering, to deduce significant similarities between pairs of sequences without resorting to expensive all vs. all comparison. We formulate the metagenomic classification problem as that of maximal quasi-clique enumeration in the resulting similarity graph, at multiple levels of the hierarchy as prescribed by different similarity thresholds. We cast execution of the underlying algorithmic steps as applications of the map-reduce framework to achieve a cloud ready implementation. We show that the resulting framework can produce high quality clustering of metagenomic samples consisting of millions of reads, in reasonable time limits, when executed on a modest size cluster. PMID:23427983

Yang, Xiao; Zola, Jaroslaw; Aluru, Srinivas

2013-02-01

448

Bipartite graph partitioning and data clustering

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.

Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.

2001-05-07

449

Data comparison algorithms for arms control treaty verification

Arms control treaty verification measures often require comparison of measurements made on treaty-limited items (TLIs) with nominal or representative values in order to verify the nature or identity of the (TLIs) in question. This paper discusses some algorithms for comparing measurements on TLIs, including algorithms based on least-squares fitting techniques and multivariate algorithms based on cluster analysis and Mahalanobis distances.

Bieber, A.M. Jr.

1993-08-01

450

An interactive approach to multiobjective clustering of gene expression patterns.

Some recent studies have posed the problem of data clustering as a multiobjective optimization problem, where several cluster validity indices are simultaneously optimized to obtain tradeoff clustering solutions. A number of cluster validity index measures are available in the literature. However, none of the measures can perform equally well in all kinds of datasets. Depending on the dataset properties and its inherent clustering structure, different cluster validity measures perform differently. Therefore, it is important to find the best set of validity indices that should be optimized simultaneously to obtain good clustering results. In this paper, a novel interactive genetic algorithm-based multiobjective approach is proposed that simultaneously finds the clustering solution as well as evolves the set of validity measures that are to be optimized simultaneously. The proposed method interactively takes the input from the human decision maker (DM) during execution and adaptively learns from that input to obtain the final set of validity measures along with the final clustering result. The algorithm is applied for clustering real-life benchmark gene expression datasets and its performance is compared with that of several other existing clustering algorithms to demonstrate its effectiveness. The results indicate that the proposed method outperforms the other existing algorithms for all the datasets considered here. PMID:23033427

Mukhopadhyay, Anirban; Maulik, Ujjwal; Bandyopadhyay, Sanghamitra

2013-01-01

451

MIP Reconstruction Techniques and Minimum Spanning Tree Clustering

The development of a tracking algorithm for minimum ionizing particles in the calorimeter and of a clustering algorithm based on the Minimum Spanning Tree approach are described. They do not depend on information from the central tracking system. Both are important components of a particle flow algorithm currently under development.

Mader, Wolfgang F.; /Iowa U.

2005-09-12

452

Differential evolution and particle swarm optimisation in partitional clustering

Many partitional clustering algorithms based on genetic algorithms (GA) have been proposed to tackle the problem of finding the optimal partition of a data set. Very few studies considered alternative stochastic search heuristics other than GAs or simulated annealing. Two promising algorithms for numerical optimisation, which are hardly known outside the search heuristics field, are particle swarm optimisation (PSO) and

Sandra Paterlini; Thiemo Krink

2006-01-01

453

Clustering of High Throughput Gene Expression Data

High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527

Pirim, Harun; Ek?io?lu, Burak; Perkins, Andy; Yüceer, Çetin

2012-01-01

454

Open Clusters versus Globular Clusters

NSDL National Science Digital Library

In this activity, students will describe similarities and differences between galactic star clusters and globular clusters. This is activity five in "The Hidden Lives of Galaxies" information and activity booklet. The booklet includes student worksheets and background information for the teacher.

2012-08-03

455

NASA Technical Reports Server (NTRS)

Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.

1999-01-01

456

Clusters of galaxies are the most recently assembled, massive, bound structures in the Universe. As predicted by General Relativity, given their masses, clusters strongly deform space-time in their vicinity. Clusters act as some of the most powerful gravitational lenses in the Universe. Light rays traversing through clusters from distant sources are hence deflected, and the resulting images of these distant objects therefore appear distorted and magnified. Lensing by clusters occurs in two regimes, each with unique observational signatures. The strong lensing regime is characterized by effects readily seen by eye, namely, the production of giant arcs, multiple-images, and arclets. The weak lensing regime is characterized by small deformations in the shapes of background galaxies only detectable statistically. Cluster lenses have been exploited successfully to address several important current questions in cosmology: (i) the study of the lens(es) - understanding cluster mass distributions and issues pertaining to cluster formation and evolution, as well as constraining the nature of dark matter; (ii) the study of the lensed objects - probing the properties of the background lensed galaxy population - which is statistically at higher redshifts and of lower intrinsic luminosity thus enabling the probing of galaxy formation at the earliest times right up to the Dark Ages; and (iii) the study of the geometry of the Universe - as the strength of lensing depends on the ratios of angular diameter distances between the lens, source and observer, lens deflections are sensitive to the value of cosmological parameters and offer a powerful geometric tool to probe Dark Energy. In this review, we present the basics of cluster lensing and provide a current status report of the field.

Jean-Paul Kneib; Priyamvada Natarajan

2012-02-03

457

Dynamic clustering using particle swarm optimization with application in image segmentation

A new dynamic clustering approach (DCPSO), based on particle swarm optimization, is proposed. This approach is applied to image segmentation. The proposed approach automatically determines the “optimum” number of clusters and simultaneously clusters the data set with minimal user interference. The algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of

Mahamed G. H. Omran; Ayed A. Salman; Andries Petrus Engelbrecht

2006-01-01

458

Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in

Xuan-Hong Dang; James Bailey

2010-01-01

459

We show that when fuzzy C-means (FCM) algorithm is used in an over-partitioning mode, the resulting membership values can be further utilized for building a connectivity graph that represents the relative distribution of the computed centroids. Standard graph-theoretic procedures and recent algorithms from manifold learning theory are subsequently applied to this graph. This facilitates the accomplishment of a great variety

Nikolaos A. Laskaris; Stefanos P. Zafeiriou

2008-01-01

460

MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS

Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...