For comprehensive and current results, perform a real-time search at Science.gov.

1

Segmentation of Vessels in Fundus Images using Spatially Weighted Fuzzy c-Means Clustering Algorithm

India. Summary This paper presents an algorithm for the extraction of Blood Vessels from Fundus images using Matched filter and Thresholding based on Spatially Weighted Fuzzy c- Means (SWFCM) clustering algorithm. Such a tool should prove useful to eyecare specialists for purposes of patient screening, treatment, and clinical study. We make use of a set of linear filters sensitive to

Giri Babu Kande; T. Satya Savithri; P. V. Subbaiah

2007-01-01

2

A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

Chahine, Firas Safwan

2012-01-01

3

NASA Astrophysics Data System (ADS)

Segmentation of Point cloud data is a key but difficult problem for architecture 3D reconstruction. Because compared to reverse engineering, there are more noise in ancient architecture point cloud data of edge because of mirror reflection and the traditional methods are hard that is not fuzzy in the preceding part of this paper, these methods can't embody the case of the points of borderline belonging two regions and it is difficult to satisfy demands of segmentation of ancient architecture point cloud data. Ancient architecture is mostly composed of columniation, plinth, arch, girder and tile on specifically order. Each of the component's surfaces is regular and smooth and belongingness of borderline points is very blurry. According to the character the author proposed a modified Fuzzy C-means clustering (MFCM) algorithm, which is used to add geometrical information during clustering. In addition this method improves belongingness constraints to avoid influence of noise on the result of segmentation. The algorithm is used in the project "Digital surveying of ancient architecture--- Forbidden City". Experiments show that the method is a good anti-noise, accuracy and adaptability and greater degree of human intervention is reduced. After segmentation internal point and point edge can be districted according membership of every point, so as to facilitate the follow-up to the surface feature extraction and model identification, and effective support for the three-dimensional model of the reconstruction of ancient buildings is provided.

Zhao, Jianghong; Li, Deren; Wang, Yanmin

2008-12-01

4

NASA Astrophysics Data System (ADS)

The integration between polarization and intensity images possessing complementary and discriminative information has emerged as a new and important research area. On the basis of the consideration that the resulting image has different clarity and layering requirement for the target and background, we propose a novel fusion method based on non-subsampled Contourlet transform (NSCT) and fuzzy C-means (FCM) segmentation for IR polarization and light intensity images. First, the polarization characteristic image is derived from fusion of the degree of polarization (DOP) and the angle of polarization (AOP) images using local standard variation and abrupt change degree (ACD) combined criteria. Then, the polarization characteristic image is segmented with FCM algorithm. Meanwhile, the two source images are respectively decomposed by NSCT. The regional energy-weighted and similarity measure are adopted to combine the low-frequency sub-band coefficients of the object. The high-frequency sub-band coefficients of the object boundaries are integrated through the maximum selection rule. In addition, the high-frequency sub-band coefficients of internal objects are integrated by utilizing local variation, matching measure and region feature weighting. The weighted average and maximum rules are employed independently in fusing the low-frequency and high-frequency components of the background. Finally, an inverse NSCT operation is accomplished and the final fused image is obtained. The experimental results illustrate that the proposed IR polarization image fusion algorithm can yield an improved performance in terms of the contrast between artificial target and cluttered background and a more detailed representation of the depicted scene.

Yu, Xuelian; Chen, Qian; Gu, Guohua; Qian, Weixian; Xu, Mengxi

2014-11-01

5

This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value. PMID:22952558

Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi

2012-01-01

6

This paper presents a novel two-step approach that incorporates fuzzy c-means (FCMs) clustering and gradient vector flow (GVF) snake algorithm for lesions contour segmentation on breast magnetic resonance imaging (BMRI). Manual delineation of the lesions by expert MR radiologists was taken as a reference standard in evaluating the computerized segmentation approach. The proposed algorithm was also compared with the FCMs clustering based method. With a database of 60 mass-like lesions (22 benign and 38 malignant cases), the proposed method demonstrated sufficiently good segmentation performance. The morphological and texture features were extracted and used to classify the benign and malignant lesions based on the proposed computerized segmentation contour and radiologists' delineation, respectively. Features extracted by the computerized characterization method were employed to differentiate the lesions with an area under the receiver-operating characteristic curve (AUC) of 0.968, in comparison with an AUC of 0.914 based on the features extracted from radiologists' delineation. The proposed method in current study can assist radiologists to delineate and characterize BMRI lesion, such as quantifying morphological and texture features and improving the objectivity and efficiency of BMRI interpretation with a certain clinical value. PMID:22952558

Pang, Yachun; Li, Li; Hu, Wenyong; Peng, Yanxia; Liu, Lizhi; Shao, Yuanzhi

2012-01-01

7

c-means clustering with the ll and l? norms

An extension of the hard and fuzzy c-means (HCM\\/FCM) clustering algorithms is described. Specifically, these models are extended to admit the case where the (dis)similarity measure on pairs of numerical vectors includes two members of the Minkowski or p-norm family, viz., the p=1 and p=? norms. In the absence of theoretically necessary conditions to guide a numerical solution of the

Leon Bobrowski; James C. Bezdek

1991-01-01

8

NASA Astrophysics Data System (ADS)

Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.

Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute

2014-05-01

9

Automatic histogram-based fuzzy C-means clustering for remote sensing imagery

NASA Astrophysics Data System (ADS)

Fuzzy C-means (FCM) clustering has been widely used in analyzing and understanding remote sensing images. However, the conventional FCM algorithm is sensitive to initialization, and it requires estimations from expert users to determine the number of clusters. To overcome the limitations of the FCM algorithm, an automatic histogram-based fuzzy C-means (AHFCM) algorithm is presented in this paper. Our proposed algorithm has two primary steps: 1 - clustering each band of a multispectral image by calculating the slope for each point of the histogram, in two directions, and executing the FCM clustering algorithm based on specific rules, and 2 - automatic fusion of labeled images is used to initialize and determine the number of clusters in the FCM algorithm for automatic multispectral image clustering. The performance of our proposed algorithm is first tested on clustering a very high resolution aerial image for various numbers of clusters and, next, on clustering two very high resolution aerial images, a high resolution Worldview2 satellite image, a Landsat8 satellite image and an EO-1 hyperspectral image, for a constant number of clusters. The superiority of the new method is demonstrated by comparing it with the well-known methods of FCM, K-means, fast global FCM (FGFCM) and kernelized fast global FCM (KFGFCM) clustering algorithms, both quantitatively by calculating the DB, XB and SC indices and qualitatively by visualizing the cluster results.

Ghaffarian, Saman; Ghaffarian, Salar

2014-11-01

10

Fuzzy c-means clustering with spatial information for image segmentation

A conventional FCM algorithm does not fully utilize the spatial information in the image. In this paper, we present a fuzzy c-means (FCM) algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership function in the neighborhood of each pixel under consideration. The advantages of the new method are the

Keh-Shih Chuang; Hong-Long Tzeng; Sharon Chen; Jay Wu; Tzong-Jer Chen

2006-01-01

11

Particle swarm optimization of kernel-based fuzzy c-means for hyperspectral data clustering

NASA Astrophysics Data System (ADS)

Hyperspectral data classification using supervised approaches, in general, and the statistical algorithms, in particular, need high quantity and quality training data. However, these limitations, and the high dimensionality of these data, are the most important problems for using the supervised algorithms. As a solution, unsupervised or clustering algorithms can be considered to overcome these problems. One of the emerging clustering algorithms that can be used for this purpose is the kernel-based fuzzy c-means (KFCM), which has been developed by kernelizing the FCM algorithm. Nevertheless, there are some parameters that affect the efficiency of KFCM clustering of hyperspectral data. These parameters include kernel parameters, initial cluster centers, and the number of spectral bands. To address these problems, two new algorithms are developed. In these algorithms, the particle swarm optimization method is employed to optimize the KFCM with respect to these parameters. The first algorithm is designed to optimize the KFCM with respect to kernel parameters and initial cluster centers, while the second one selects the optimum discriminative subset of bands and the former parameters as well. The evaluations of the results of experiments show that the proposed algorithms are more efficient than the standard k-means and FCM algorithms for clustering hyperspectral remotely sensed data.

Niazmardi, Saeid; Naeini, Amin Alizadeh; Homayouni, Saeid; Safari, Abdolreza; Samadzadegan, Farhad

2012-01-01

12

Fuzzy C-means Method for Clustering Microarray Data

Motivation: Clustering analysis of data from DNA microar- ra yh ybridization studies is essential for identifying biologi- cally relevant groups of genes. Partitional clustering meth- ods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here

Doulaye Dembélé; Philippe Kastner

2003-01-01

13

The relative fibroglandular tissue content in the breast, commonly referred to as breast density, has been shown to be the most significant risk factor for breast cancer after age. Currently, the most common approaches to quantify density are based on either semi-automated methods or visual assessment, both of which are highly subjective. This work presents a novel multi-class fuzzy c-means (FCM) algorithm for fully-automated identification and quantification of breast density, optimized for the imaging characteristics of digital mammography. The proposed algorithm involves adaptive FCM clustering based on an optimal number of clusters derived by the tissue properties of the specific mammogram, followed by generation of a final segmentation through cluster agglomeration using linear discriminant analysis. When evaluated on 80 bilateral screening digital mammograms, a strong correlation was observed between algorithm-estimated PD% and radiological ground-truth of r=0.83 (p<0.001) and an average Jaccard spatial similarity coefficient of 0.62. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner. PMID:22003744

Keller, Brad; Nathan, Diane; Wang, Yan; Zheng, Yuanjie; Gee, James; Conant, Emily; Kontos, Despina

2011-01-01

14

A hybrid model for bankruptcy prediction using genetic algorithm, fuzzy c-means and mars

Bankruptcy prediction is very important for all the organization since it affects the economy and rise many social problems with high costs. There are large number of techniques have been developed to predict the bankruptcy, which helps the decision makers such as investors and financial analysts. One of the bankruptcy prediction models is the hybrid model using Fuzzy C-means clustering and MARS, which uses static ratios taken from the bank financial statements for prediction, which has its own theoretical advantages. The performance of existing bankruptcy model can be improved by selecting the best features dynamically depend on the nature of the firm. This dynamic selection can be accomplished by Genetic Algorithm and it improves the performance of prediction model.

Martin, A; Saranya, G; Gayathri, P; Venkatesan, Prasanna

2011-01-01

15

We present a novel algorithm for obtaining fuzzy segmentations of images that are subject to multiplicative intensity inhomogeneities, such as magnetic resonance images. The algorithm is formulated by modifying the objective function in the fuzzy C-means algorithm to include a multiplier field, which allows the centroids for each class to vary across the image. First and second order regularization terms

Dzung L. Pham; Jerry L. Prince

1999-01-01

16

A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.

One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990

Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A B

2013-01-01

17

A Wavelet Relational Fuzzy C-Means Algorithm for 2D Gel Image Segmentation

One of the most famous algorithms that appeared in the area of image segmentation is the Fuzzy C-Means (FCM) algorithm. This algorithm has been used in many applications such as data analysis, pattern recognition, and image segmentation. It has the advantages of producing high quality segmentation compared to the other available algorithms. Many modifications have been made to the algorithm to improve its segmentation quality. The proposed segmentation algorithm in this paper is based on the Fuzzy C-Means algorithm adding the relational fuzzy notion and the wavelet transform to it so as to enhance its performance especially in the area of 2D gel images. Both proposed modifications aim to minimize the oversegmentation error incurred by previous algorithms. The experimental results of comparing both the Fuzzy C-Means (FCM) and the Wavelet Fuzzy C-Means (WFCM) to the proposed algorithm on real 2D gel images acquired from human leukemias, HL-60 cell lines, and fetal alcohol syndrome (FAS) demonstrate the improvement achieved by the proposed algorithm in overcoming the segmentation error. In addition, we investigate the effect of denoising on the three algorithms. This investigation proves that denoising the 2D gel image before segmentation can improve (in most of the cases) the quality of the segmentation. PMID:24174990

Rashwan, Shaheera; Faheem, Mohamed Talaat; Sarhan, Amany; Youssef, Bayumy A. B.

2013-01-01

18

\\u000a In this paper an automatic unsupervised method for the segmentation of retinal vessels is proposed. Three features are extracted\\u000a from the tested image. The features are scaled down by a factor of 2 and mapped into a Self-Organizing Map. A modified Fuzzy\\u000a C-Means clustering algorithm is used to divide the neuron units of the map in 2 classes. The entire

Carmen Alina Lupascu; Domenico Tegolo

19

An incremental clustering algorithm based on Mahalanobis distance

NASA Astrophysics Data System (ADS)

Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.

Aik, Lim Eng; Choon, Tan Wee

2014-12-01

20

Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering

Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to other robust spike sorting algorithms. Conclusions This new software provides neuroscience laboratories with a new tool for fast and robust online classification of single neuron activity. This feature could become crucial in situations when online spike detection from multiple electrodes is paramount, such as in human clinical recordings or in brain-computer interfaces. PMID:22871125

2012-01-01

21

Image watermarking using a dynamically weighted fuzzy c-means algorithm

NASA Astrophysics Data System (ADS)

Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.

Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon

2011-10-01

22

Remote sensing ocean data analyses using fuzzy C-Means clustering

NASA Astrophysics Data System (ADS)

With the deep understanding and exploitation of the wide Ocean, There are more and more fine instrument installed or loaded on measuring ships or other marines. The high costs and complexity of corrosion place ever-increasing demands on the analyses of surrounding ocean environment. In this paper, the fuzzy C-Means clustering is used to analyze the surrounding ocean environment with remote sensing data. The studied ocean area is considered as a two dimensional gird or an image, and the fuzzy C-Means clustering technique is used to reveal the underlying relationship of the elements and segment the interrelated ocean in regions with similar spectral properties in the influence of instrument corrosion. The influence of the environment elements in instrument corrosion is studied and a priori spatial information is added to improving the segmentation result. The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixels. By making use of the global searching ability of the predator-prey particle swarm optimization, the optimal cluster center could be obtained by iterative optimization and the segmentation could be accomplished. The calculation results show that the segmentation is accurate and reasonable. This ocean environment analysis fruit has used in real application and has proved to be valuable in ship instrument corrosion monitoring and the guide of other ocean activity.

Xu, Suqin; Chen, Jie; Gao, Guoxing

2009-10-01

23

Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm

NASA Astrophysics Data System (ADS)

Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.

Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.

2011-10-01

24

T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation

NASA Astrophysics Data System (ADS)

The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.

Despotovi?, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried

2010-03-01

25

In this paper, we propose a novel approach for the automatic breast boundary segmentation using spatial fuzzy c means clustering and active contours models. We will evaluate the performance of the approach on screen film mammographic images digitized by specific scanner devices and full-field digital mammographic images at different spatial and pixel resolutions. Expert radiologists have supplied the reference boundary

Arianna Mencattini; Marcello Salmeri; Paola Casti; Grazia Raguso; Samuela L'Abbate; Loredana Chieppa; Antonietta Ancona; Fabio Mangieri; Maria Luisa Pepe

2011-01-01

26

Magnetic resonance images are often corrupted by intensity inhomogeneity, which manifests itself as slow intensity variations of the same tissue over the image domain. Such shading artifacts must be corrected before doing computerized analysis such as intensity-based segmentation and quantitative analysis. In this paper, we present a fuzzy c-means (FCM) based algorithm that simultaneously estimates the shading effect while segmenting

Weijie Chen; Maryellen L. Giger

2004-01-01

27

Alpha-Cut Implemented Fuzzy Clustering Algorithms and Switching Regressions

In the fuzzy c-means (FCM) clustering algorithm, almost none of the data points have a membership value of 1. Moreover, noise and outliers may cause difficulties in obtain- ing appropriate clustering results from the FCM algorithm. The embedding of FCM into switching regressions, called the fuzzy c-regressions (FCRs), still has the same drawbacks as FCM. In this paper, we propose

Miin-shen Yang; Kuo-lung Wu; June-nan Hsieh; Jian Yu

2008-01-01

28

Survey of clustering algorithms

Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics,

Rui Xu; Donald Wunsch II

2005-01-01

29

Fuzzy C-Means (FCM) is a standard technique for exploratory analysis and is readily adaptable to integrate unique data characteristics and auxiliary feature relations. Distinguishing between the spatial and temporal features of functional magnetic resonance imaging (fMRI) time courses (TC) has proved effective in reducing the presence of false positives for stimulation studies. The fuzzy partitions generated by this FCM variant

M. D. Alexiuk; N. J. Pizzi

2004-01-01

30

Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.

Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal. PMID:24196983

de Luis Balaguer, Maria A; Williams, Cranos M

2014-08-01

31

Exudates are the primary sign of Diabetic Retinopathy. Early detection can potentially reduce the risk of blindness. An automatic method to detect exudates from low-contrast digital images of retinopathy patients with non-dilated pupils using a Fuzzy C-Means (FCM) clustering is proposed. Contrast enhancement preprocessing is applied before four features, namely intensity, standard deviation on intensity, hue and a number of edge pixels, are extracted to supply as input parameters to coarse segmentation using FCM clustering method. The first result is then fine-tuned with morphological techniques. The detection results are validated by comparing with expert ophthalmologists’ hand-drawn ground-truths. Sensitivity, specificity, positive predictive value (PPV), positive likelihood ratio (PLR) and accuracy are used to evaluate overall performance. It is found that the proposed method detects exudates successfully with sensitivity, specificity, PPV, PLR and accuracy of 87.28%, 99.24%, 42.77%, 224.26 and 99.11%, respectively. PMID:22574005

Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah

2009-01-01

32

Analysis of B-mode ultrasound images of the carotid atheromatous plaque includes the estimation of texture from static images and the estimation of motion from image sequences. The combination of these two types of information may be valuable for accurate diagnosis of vascular disease. The purpose of this paper was to study texture and motion patterns of carotid atherosclerosis and select the optimal combination of features that can characterize plaque. B-mode ultrasound images of 10 symptomatic and 9 asymptomatic plaques were interrogated. A total of 99 texture features were estimated using first-order statistics, second-order statistics, Laws texture energy and the fractal dimension. Only five texture features were significantly different between the two groups. In the same subjects, the motion of selected plaque regions was estimated using region tracking and block-matching and expressed through: a/maximal surface velocity (MSV), and b/maximal relative surface velocity (MRSV). MSV and MRSV were significantly lower in asymptomatic plaques suggesting more homogeneous motion patterns. Clustering using fuzzy c-means correctly classified 74% of plaques based on texture features only, and 79% of plaques based on motion features only. Classification performance reached 84% when a combination of motion and texture features was used. PMID:17271957

Stoitsis, J; Golemati, S; Nikita, K S; Nicolaides, A N

2004-01-01

33

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.

Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Applied Mathematics and Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States); Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2012-08-15

34

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r = 0.82, p < 0.001) and processed (r = 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r = 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's ? ? 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies. PMID:22894417

Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina

2012-01-01

35

, it is being investigated as a method for automatic early detection of pre-cancerous changes. In previous work Analysis (HCA) and Fuzzy C-Means (FCM) clustering, are used to classify sets of oral cancer cell data. This makes it a potentially powerful tool in cancer diagnosis, as it can help detecting abnormal cells

Garibaldi, Jon

36

Alpha-cut implemented fuzzy clustering algorithms and switching regressions.

In the fuzzy c-means (FCM) clustering algorithm, almost none of the data points have a membership value of 1. Moreover, noise and outliers may cause difficulties in obtaining appropriate clustering results from the FCM algorithm. The embedding of FCM into switching regressions, called the fuzzy c-regressions (FCRs), still has the same drawbacks as FCM. In this paper, we propose the alpha-cut implemented fuzzy clustering algorithms, referred to as FCMalpha, which allow the data points being able to completely belong to one cluster. The proposed FCMalpha algorithms can form a cluster core for each cluster, where data points inside a cluster core will have a membership value of 1 so that it can resolve the drawbacks of FCM. On the other hand, the fuzziness index m plays different roles for FCM and FCMalpha. We find that the clustering results obtained by FCMalpha are more robust to noise and outliers than FCM when a larger m is used. Moreover, the cluster cores generated by FCMalpha are workable for various data shape clusters, so that FCMalpha is very suitable for embedding into switching regressions. The embedding of FCMalpha into switching regressions is called FCRalpha. The proposed FCRalpha provides better results than FCR for environments with noise or outliers. Numerical examples show the robustness and the superiority of our proposed methods. PMID:18558526

Yang, Miin-Shen; Wu, Kuo-Lung; Hsieh, June-Nan; Yu, Jian

2008-06-01

37

Applying fuzzy and rough set theory, researching into the sample’s clustering analysis and each factor’s reasonable authorization\\u000a with regard to evaluation and prediction, the thesis gives fuzzy clustering based on the primitive statistics without human\\u000a prior knowledge. On this basis, the thesis mines each evaluation factor weight from primitive statistics and develops new\\u000a method of comprehensive evaluation. In accordance with

Gu-xin Li; Ke-ying Jiao; Qi Niu

38

Bias Field Estimation and Adaptive Segmentation of MRI Data Using a Modi ed Fuzzy C-Means Algorithm

M. N. Ahmed , S. M. Yamany , A. A. Farag , and T. Moriarty+ Univ. of Louisville, E.E. Dept., Louisville, KY 40292 E-mail:fmohamed,faragg@cairo.spd.louisville.edu +Department of Neurological Surgery, University of Louisville, Louisville, KY 40292. Abstract In this paper, we present a novel algorithm

Farag, Aly A.

39

On Spectral Clustering: Analysis and an algorithm

Despite many empirical successes of spectral clustering methods|algorithms that cluster points using eigenvectors of matrices derivedfrom the distances between the points|there are several unresolvedissues. First, there is a wide variety of algorithms thatuse the eigenvectors in slightly dierent ways. Second, many ofthese algorithms have no proof that they will actually compute areasonable clustering. In this paper, we present a simple

Andrew Y. Ng; Michael I. Jordan; Yair Weiss

2001-01-01

40

Genetic algorithm-based clustering technique

A genetic algorithm-based clustering technique, called GA-clustering, is proposed in this article. The searching capability of genetic algorithms is exploited in order to search for appropriate cluster centres in the feature space such that a similarity metric of the resulting clusters is optimized. The chromosomes, which are represented as strings of real numbers, encode the centres of a \\

Ujjwal Maulik; Sanghamitra Bandyopadhyay

2000-01-01

41

Multisolutional clustering and quantization algorithm (MCQ)

We have developed a novel clustering and quantization algorithm that allows the user to create multiple one-topone correspondences between the actual data and its transformed (clustered and quantized) values, based on the user's hypothesis regarding the nature of the classification task. The types of problems for which the algorithm can be beneficial are discussed. We report experiments employing simulated and

I. Dvorchik; W. Marsh; V. Gurari; M. Subotin; H. R. Doyle

1996-01-01

42

A Survey of Evolutionary Algorithms for Clustering

This paper presents a survey of evolutionary algorithms designed for clustering tasks. It tries to reflect the profile of this area by focusing more on those subjects that have been given more importance in the literature. In this context, most of the paper is devoted to partitional algorithms that look for hard clusterings of data, though overlapping (i.e., soft and

Eduardo Raul Hruschka; Ricardo José Gabrielli Barreto Campello; Alex Alves Freitas; André Carlos Ponce Leon Ferreira de Carvalho

2009-01-01

43

NASA Astrophysics Data System (ADS)

Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.

Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline

2013-04-01

44

Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.

Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S. [Department of Radiation Oncology, University of Michigan, 1500 East Medical Center Drive, SPC 5010, Ann Arbor, Michigan 48109-5010 (United States)] [Department of Radiation Oncology, University of Michigan, 1500 East Medical Center Drive, SPC 5010, Ann Arbor, Michigan 48109-5010 (United States); Cao, Yue, E-mail: yuecao@umich.edu [Department of Radiation Oncology, University of Michigan, 1500 East Medical Center Drive, SPC 5010, Ann Arbor, Michigan 48109-5010 (United States) [Department of Radiation Oncology, University of Michigan, 1500 East Medical Center Drive, SPC 5010, Ann Arbor, Michigan 48109-5010 (United States); Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842 (United States); Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099 (United States)

2014-01-15

45

Approximation Algorithms for Hamming Clustering Problems

Approximation Algorithms for Hamming Clustering Problems Leszek G#24;asieniec 1 , Jesper Jansson 2.Lingasg@cs.lth.se Abstract. We study Hamming versions of two classical clustering prob- lems. The Hamming radius p that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum

Gasieniec, Leszek

46

An adaptive clustering algorithm for image segmentation

The problem of segmenting images of objects with smooth surfaces is considered. The algorithm that is presented is a generalization of the K-means clustering algorithm to include spatial constraints and to account for local intensity variations in the image. Spatial constraints are included by the use of a Gibbs random field model. Local intensity variations are accounted for in an

T. N. Pappas; N. Pappas

1992-01-01

47

This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases. PMID:24790590

Ergen, Burhan

2014-01-01

48

An algorithm for spatial heirarchy clustering

NASA Technical Reports Server (NTRS)

A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.

Dejesusparada, N. (principal investigator); Velasco, F. R. D.

1981-01-01

49

A drastic improvement in the analysis of gene expression has lead to new discoveries in bioinformatics research. In order to analyse the gene expression data, fuzzy clustering algorithms are widely used. However, the resulting analyses from these specific types of algorithms may lead to confusion in hypotheses with regard to the suggestion of dominant function for genes of interest. Besides that, the current fuzzy clustering algorithms do not conduct a thorough analysis of genes with low membership values. Therefore, we present a novel computational framework called the "multi-stage filtering-Clustering Functional Annotation" (msf-CluFA) for clustering gene expression data. The framework consists of four components: fuzzy c-means clustering (msf-CluFA-0), achieving dominant cluster (msf-CluFA-1), improving confidence level (msf-CluFA-2) and combination of msf-CluFA-0, msf-CluFA-1 and msf-CluFA-2 (msf-CluFA-3). By employing double filtering in msf-CluFA-1 and apriori algorithms in msf-CluFA-2, our new framework is capable of determining the dominant clusters and improving the confidence level of genes with lower membership values by means of which the unknown genes can be predicted. PMID:23930805

Kasim, Shahreen; Deris, Safaai; Othman, Razib M

2013-09-01

50

Image Thresholding Using Cellular Neural Network Combined with Fuzzy C-Means

Thresholding is one of the old, simple, and popular techniques for image segmentation, and has been widely studied. In this paper, an approach for image thresholding based on cellular neural network (CNN) combined with fuzzy c-means (FCM) is presented. The approach realized by threshold CNN (T-CNN), which threshold is obtained automatically via FCM clustering algorithm. Experimental results on real images

Jiayin Kang

2008-01-01

51

Performance Comparison Of Evolutionary Algorithms For Image Clustering

NASA Astrophysics Data System (ADS)

Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.

Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.

2014-09-01

52

Fast and Robust General Purpose Clustering Algorithms

General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose an algorithm that

Vladimir Estivill-castro; Jianhua Yang

2004-01-01

53

Fast and Robust General Purpose Clustering Algorithms

General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-Means has several disadvantages derived from its statistical simplicity. We pro- pose an algorithm

Vladimir Estivill-castro; Jianhua Yang

2000-01-01

54

Sparse Subspace Clustering: Algorithm, Theory, and Applications.

We propose and study an algorithm, called Sparse Subspace Clustering, to cluster high-dimensional data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points that come from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data. As solving the sparse optimization program is NP-hard, we consider its convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm does not require initialization, can be solved efficiently, and can handle data points near the intersections of subspaces. In addition, our algorithm can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by modifying the optimization program to incorporate the model of the data. We verify the effectiveness of the proposed algorithm through experiments on synthetic data as well as two real-world problems of motion segmentation and face clustering. PMID:23509183

Elhamifar, Ehsan; Vidal, Rene

2013-03-14

55

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.

Bezdek, James C.

1991-01-01

56

CURE: An Efficient Clustering Algorithm for Large Databases

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very frag- ile in the presence of outliers. We propose a new cluster- ing algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

1998-01-01

57

DEPARTAMENTO DE COMPUTACION Constrained Clustering Algorithms

DEPARTAMENTO DE COMPUTACIÂ´ON Constrained Clustering Algorithms: Practical Issues and Applications PHD THESIS TESE DE DOUTORAMENTO Manuel Eduardo Ares Brea 2013 #12;#12;DEPARTAMENTO DE COMPUTACIÂ´ON ComputaciÂ´on e Intelixencia Artificial da Universidade da Coru~na CERTIFICA Que a presente memoria

Barreiro, Alvaro

58

The Georgi Algorithms of Jet Clustering

We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing sound support from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$. Based on three basic requirements that the result of jet clustering is process-independent, for softer subjets the inclusion cone is larger, and that the cone size cannot be too large in order to avoid mixing different jets, we derive constraints on the jet function index $n$ and the jet function parameter $\\beta$ which are closely related to phase space boundaries. Finally, we demonstrate that the jet algorithm is boost invariant.

Shao-Feng Ge

2014-08-30

59

A practical clustering algorithm for static and dynamic information organization

We present and analyze the off-line star algorithm for clustering static information systems and the on-line star algorithm for clustering dynamic information systems. These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally efficient cover by dense subgraphs. We further show a lower bound on the quality of the

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1999-01-01

60

The Star Clustering Algorithm for Static and Dynamic Information Organization

Abstract We present and analyze the o - line star algorithm for clustering static information systems and the on - line star algorithm for clustering dynamic information systems These algorithms organize a document collection into a number of clusters that is naturally induced by the collection via a computationally e cient cover by dense subgraphs We further show a lower

Javed A. Aslam; Ekaterina Pelekhov; Daniela Rus

2004-01-01

61

An evolutionary clustering algorithm for gene expression microarray data analysis

Clustering is concerned with the discovery of in- teresting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms

Patrick C. H. Ma; Keith C. C. Chan; Xin Yao; David K. Y. Chiu

2006-01-01

62

Sparse subspace clustering: algorithm, theory, and applications.

Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734

Elhamifar, Ehsan; Vidal, René

2013-11-01

63

Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.

Hilbert, E. E.

1977-01-01

64

A hybrid discrete Artificial Bee Colony - GRASP algorithm for clustering

This paper presents a new hybrid algorithm, which is based on the concepts of the artificial bee colony (ABC) and greedy randomized adaptive search procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm is a two phase algorithm which combines an artificial bee colony optimization algorithm for the solution of the feature selection problem and a

Y. Marinakis; M. Marinaki; N. Matsatsinis

2009-01-01

65

ROCK: A Robust Clustering Algorithm for Categorical Attributes

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in dierent partitions. In this paper, we study clustering algorithms for data with boolean and

Sudipto Guha; Rajeev Rastogi; Kyuseok Shim

2000-01-01

66

A Hybrid Monkey Search Algorithm for Clustering Analysis

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

67

Energy Aware Clustering Algorithms for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.

Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian

2011-09-01

68

A Robust Competitive Clustering Algorithm With Applications in Computer Vision

This paper addresses three major issues associated with conventional partitional clustering, namely, sensitivity to initialization, difficulty in determining the number of clusters, and sensitivity to noise and outliers. The proposed robust competitive agglomeration (RCA) algorithm starts with a large number of clusters to reduce the sensitivity to initialization, and determines the actual number of clusters by a process of competitive

Hichem Frigui; Raghu Krishnapuram

1999-01-01

69

Intel Technical Report A Comparison of Three Document Clustering Algorithms: TreeCluster,

of the document collection. Prior Work Hierarchical Agglomerative Clustering A common technique to clusterIntel Technical Report A Comparison of Three Document Clustering Algorithms: TreeCluster, Word Applications Intel Architecture Labs Abstract This work investigated three techniques to automatically cluster

Mock, Kenrick

70

Algorithms of maximum likelihood data clustering with applications

NASA Astrophysics Data System (ADS)

We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.

Giada, Lorenzo; Marsili, Matteo

2002-12-01

71

CLUSTERING BASED REGION GROWING ALGORITHM FOR COLOR IMAGE SEGMENTATION

CLUSTERING BASED REGION GROWING ALGORITHM FOR COLOR IMAGE SEGMENTATION Bogdan Cramariuc, Moncef that is used. Key words: marker extraction, color image segmentation, clustering, region growing. 1. INTRODUCTION In image segmentation the problem of developing automatic segmentation procedures has always been

Gabbouj, Moncef

72

A two-leveled symbiotic evolutionary algorithm for clustering problems

Because of its unsupervised nature, clustering is one of the most challenging problems, considered as a NP-hard grouping problem.\\u000a Recently, several evolutionary algorithms (EAs) for clustering problems have been presented because of their efficiency for\\u000a solving the NP-hard problems with high degree of complexity. Most previous EA-based algorithms, however, have dealt with the\\u000a clustering problems given the number of clusters

Kyoung Seok Shin; Young-Seon Jeong; Myong K. Jeong

73

PMAFC: A New Probabilistic Memetic Algorithm Based Fuzzy Clustering

\\u000a In this article, a new stochastic approach in form of memetic algorithm for fuzzy clustering is presented. The proposed probabilistic\\u000a memetic algorithm based fuzzy clustering technique uses real-coded encoding of the cluster centres and two fuzzy clustering\\u000a validity measures to compute a priori probability for an objective function. Moreover, the adaptive arithmetic recombination and opposite based local search techniques\\u000a are

Indrajit Saha; Ujjwal Maulik; Dariusz Plewczynski

74

An overview and new methods in fuzzy clustering

Principal methods in nonhierarchical and hierarchical fuzzy clustering are overviewed. In particular, the method of fuzzy c-means is focused upon and recent algorithms in fuzzy c-means are described. It is shown that the concept of regularization plays an important role in the fuzzy c-means. Classification functions induced from fuzzy clustering are discussed and variations of the standard fuzzy c-means are

Sadaaki Miyamoto

1998-01-01

75

Application of K- and fuzzy c-means for color segmentation of thermal infrared breast images.

Color segmentation of infrared thermal images is an important factor in detecting the tumor region. The cancerous tissue with angiogenesis and inflammation emits temperature pattern different from the healthy one. In this paper, two color segmentation techniques, K-means and fuzzy c-means for color segmentation of infrared (IR) breast images are modeled and compared. Using the K-means algorithm in Matlab, some empty clusters may appear in the results. Fuzzy c-means is preferred because the fuzzy nature of IR breast images helps the fuzzy c-means segmentation to provide more accurate results with no empty cluster. Since breasts with malignant tumors have higher temperature than healthy breasts and even breasts with benign tumors, in this study, we look for detecting the hottest regions of abnormal breasts which are the suspected regions. The effect of IR camera sensitivity on the number of clusters in segmentation is also investigated. When the camera is ultra sensitive the number of clusters being considered may be increased. PMID:20192053

EtehadTavakol, M; Sadri, S; Ng, E Y K

2010-02-01

76

Cuckoo Search Clustering Algorithm: A novel strategy of biomimicry

A novel, nature inspired, unsupervised classification method, based on the most recent metaheuristic algorithm, stirred by the breeding strategy of the parasitic bird, the cuckoo, is introduced in this paper. The proposed Cuckoo Search Clustering Algorithm (CSCA) yields good results on benchmark dataset. Inspired by the results, the proposed algorithm is validated on two real time remote sensing satellite- image

Samiksha Goel; Arpita Sharma; Punam Bedi

2011-01-01

77

A biased random-key genetic algorithm for data clustering.

Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data. PMID:23896381

Festa, P

2013-09-01

78

LBG Algorithm LBG algorithm is like a K-means clustering algorithm which takes a set of input

LBG Algorithm LBG algorithm is like a K-means clustering algorithm which takes a set of input measure. For the application of Vector Quantization (VQ), d = 16, K = 256 or 512 are commonly used. LBG, Â· Â· Â· , K}, The convergence of LBG algorithm depends on the initial codebook C, the distortion Dk

Chen, Chaur-Chin

79

A fuzzy clustering algorithm to detect planar and quadric shapes

NASA Technical Reports Server (NTRS)

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

80

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

81

The ordered clustered travelling salesman problem: a hybrid genetic algorithm.

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148

Ahmed, Zakir Hussain

2014-01-01

82

The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148

Ahmed, Zakir Hussain

2014-01-01

83

Performance Characterization of Clustering Algorithms for Colour Image Segmentation

Performance Characterization of Clustering Algorithms for Colour Image Segmentation Dana Elena Ilea to extract the colour information that is used in the image segmentation process. The aim of this paper on a large suite of mosaic images. Index terms: Colour image segmentation, Fuzzy clustering, Competitive

Whelan, Paul F.

84

Clustered memetic algorithm for protein structure prediction

Memetic algorithm (MA) often perform better than other evolutionary algorithm due to their combining the local search with the process of global optimization. However, like any other evolutionary algorithm (EA), MA due to the problem of genetic drift often result in sub-optimal solutions. The problem is more aggravated when EAs are applied to search complex landscape of NP complete problem

Madhu Chetty; Mohammad Kamrul ISLAM

2010-01-01

85

A New Clustering Algorithm Based Upon Flocking On Complex Network

We have proposed a model based upon flocking on a complex network, and then developed two clustering algorithms on the basis of it. In the algorithms, firstly a k-nearest neighbor (knn) graph as a weighted and directed graph is produced among all data points in a dataset each of which is regarded as an agent who can move in space,

Qiang Li; Yan He; Jing-ping Jiang

2008-01-01

86

Clustering of Hadronic Showers with a Structural Algorithm

The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.

Charles, M.J.; /SLAC

2005-12-13

87

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

NASA Technical Reports Server (NTRS)

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

2006-01-01

88

Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm

As intrinsic structures, like the number of clusters, is, for real data, a major issue of the clustering problem, we propose, in this paper, CHyGA (Clustering Hybrid Genetic Algorithm) an hybrid genetic algorithm for clustering. CHyGA treats the clustering problem as an optimization problem and searches for an optimal number of clusters characterized by an optimal distribution of instances into

Laetitia Vermeulen-jourdan; Clarisse Dhaenens; El-ghazali Talbi

2004-01-01

89

Parcellation, one of several brain analysis methods, is a procedure popular for subdividing the regions identified by segmentation into smaller topographically defined units. The fuzzy clustering algorithm is mainly used to preprocess parcellation into several segmentation methods, because it is very appropriate for the characteristics of magnetic resonance imaging (MRI), such as partial volume effect and intensity inhomogeneity. However, some gray matter, such as basal ganglia and thalamus, may be misclassified into the white matter class using the conventional fuzzy C-Means (FCM) algorithm. Parcellation has been nearly achieved through manual drawing, but it is a tedious and time-consuming process. We propose improved classification using successive fuzzy clustering and implementing the parcellation module with the modified graphic user interface (GUI) for the convenience of users. PMID:11442112

Yoon, U C; Kim, J S; Kim, J S; Kim, I Y; Kim, S I

2001-06-01

90

K-Distributions: A New Algorithm for Clustering Categorical Data

NASA Astrophysics Data System (ADS)

Clustering is one of the most important tasks in data mining. The K-means algorithm is the most popular one for achieving this task because of its efficiency. However, it works only on numeric values although data sets in data mining often contain categorical values. Responding to this fact, the K-modes algorithm is presented to extend the K-means algorithm to categorical domains. Unfortunately, it suffers from computing the dissimilarity between each pair of objects and the mode of each cluster. Aiming at addressing these problems confronting K-modes, we present a new algorithm called K-distributions in this paper. We experimentally tested K-distributions using the well known 36 UCI data sets selected by Weka, and compared it to K-modes. The experimental results show that K-distributions significantly outperforms K-modes in term of clustering accuracy and log likelihood.

Cai, Zhihua; Wang, Dianhong; Jiang, Liangxiao

91

Coupled-cluster response theory: Parallel algorithms and novel applications

NASA Astrophysics Data System (ADS)

The parallel implementation of coupled-cluster response theory within NWChem and its subsequent application to novel chemical problems is reported. Linear-response dipole polarizabilities of polyacenes, the 60-carbon buckyball, and larger water clusters were computed with coupled-cluster singles and doubles (CCSD) and compared to density-functional results. The complete treatment of coupled-cluster response theory including up to triples (CCSDT) was applied to diatomic molecules using large basis sets and this method was used to evaluate a newly-developed perturbative approximation for triples. Hyperpolarizabilities and Lennard-Jones coefficients were implemented at the CCSD level of theory by extending the linear response code in two different ways. Benchmark hyperpolarizabilities are reported for molecules as large as para-nitroaniline using large basis sets. Tensor transpose algorithms are shown to be an important component in a coupled-cluster property code and automatic code generation successfully identified faster algorithms for these.

Hammond, Jeffrey Richard

92

Sampling Within k-Means Algorithm to Cluster Large Datasets

Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

2011-08-01

93

Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis

NASA Astrophysics Data System (ADS)

Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.

Trujillo San-Martin, Maite; Hlebarov, Vejen; Sadki, Mustapha

2004-11-01

94

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Clustering algorithms are attractive for the task of class iden- tification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large da- tabases. The well-known clustering algorithms offer no solu- tion to

Martin Ester; Hans-peter Kriegel; Jörg Sander; Xiaowei Xu

1996-01-01

95

Performance Evaluation of Some Clustering Algorithms and Validity Indices

In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, Calinski-Harabasz index, and a recently developed indexI. Based on a relation between the indexI and the Dunn's index, a lower bound of the value of the

Ujjwal Maulik; Sanghamitra Bandyopadhyay

2002-01-01

96

This paper introduces a new hybrid algorithmic nature in- spired approach based on the concepts of the Honey Bees Mating Opti- mization Algorithm (HBMO) and of the Greedy Randomized Adaptive Search Procedure (GRASP), for optimally clustering N objects into K clusters. The proposed algorithm for the Clustering Analysis, the Hybrid HBMO-GRASP, is a two phase algorithm which combines a HBMO

Yannis Marinakis; Magdalene Marinaki; Nikolaos F. Matsatsinis

2007-01-01

97

A Decentralized Fuzzy C-Means-Based Energy-Efficient Routing Protocol for Wireless Sensor Networks

Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060

2014-01-01

98

The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey

We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.

Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,

2005-03-01

99

In search of optimal clusters using genetic algorithms

Genetic Algorithms (GAs) are generally portrayed as search procedures which can optimize functions based on a limited sample of function values. In this paper, GAs have been used in an attempt to optimize a specified objective function related to a clustering problem. Several experiments on synthetic and real life data sets show the utility of the proposed method. K-Means is

C. A. Murthy; Nirmalya Chowdhury

1996-01-01

100

High-dimensional cluster analysis with the Masked EM Algorithm

Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

2014-01-01

101

An Efficient Cluster Algorithm for CP(N-1) Models

We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.

B. B. Beard; M. Pepe; S. Riederer; U. -J. Wiese

2005-10-06

102

Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data

Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results

Gengxin Chen; Saied A. Jaradat; Nila Banerjee; Tetsuya S. Tanaka; Minoru S. H. Ko; Michael Q. Zhang

2002-01-01

103

An evolutionary technique based on K-Means algorithm for optimal clustering in RN

A genetic algorithm-based efficient clustering technique that utilizes the principles of K-Means algorithm is described in this paper. The algorithm called KGA-clustering, while exploiting the searching capability of K-Means, avoids its major limitation of getting stuck at locally optimal values. Its superiority over the K-Means algorithm and another genetic algorithm-based clustering method, is extensively demonstrated for several artificial and real

Sanghamitra Bandyopadhyay; Ujjwal Maulik

2002-01-01

104

Partially supervised clustering for image segmentation

All clustering algorithms process unlabeled data and, consequently, suffer from two problems: (P1) choosing and validating the correct number of clusters and (P2) insuring that algorithmic labels correspond to meaningful physical labels. Clustering algorithms such as hard and fuzzy c-means, based on optimizing sums of squared errors objective functions, suffer from a third problem: (P3) a tendency to recommend solutions

Amine M. Bensaid; Lawrence O. Hall; James C. Bezdek; Laurence P. Clarke

1996-01-01

105

An improved distance matrix computation algorithm for multicore clusters.

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779

Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M

2014-01-01

106

An Improved Distance Matrix Computation Algorithm for Multicore Clusters

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779

Al-Neama, Mohammed W.; Reda, Naglaa M.; Ghaleb, Fayed F. M.

2014-01-01

107

H-VQ: Vector Quantization by LBG Algorithm LBG algorithm is like a K-means clustering algorithm are commonly used. LBG Algorithm 1. Input training vectors S = {xi Rd | i = 1, 2, Â· Â· Â·, n}. 2. Initiate = {cj Rd | j = 1, 2, Â· Â· Â· , K}, The convergence of LBG algorithm depends on the initial codebook C

Chen, Chaur-Chin

108

Multi-Objective Evolutionary Clustering using Variable-Length Real Jumping Genes Genetic Algorithm

In this paper, we present a novel multi-objective evolutionary clustering approach using variable-length real jumping genes genetic algorithms (VRJGGA). The proposed algorithm that extends jumping genes genetic algorithm (JGGA) (Man et al., 2004) evolves near-optimal clustering solutions using multiple clustering criteria, without a-priori knowledge of the actual number of clusters. Experimental results based on several artificial and real-world data show

Kazi Shah Nawaz Ripon; Chi-ho Tsang; Sam Kwong; Ip Man-ki

2006-01-01

109

Finding reproducible cluster partitions for the k-means algorithm

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085

2013-01-01

110

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Pradipta Maji, 203 B. T. Road, Kolkata, 700 108, India {pmaji,sankar}@isical.ac.in Abstract. Image segmentation resonance (MR) images. In this paper, the rough-fuzzy c-means (RFCM) algorithm is presented for segmentation

Pal, Sankar Kumar

111

Mammographic images segmentation based on chaotic map clustering algorithm

Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766

2014-01-01

112

A new algorithm to track the identity of clusters in physical systems

NASA Astrophysics Data System (ADS)

In order to analyze the dynamic evolution clusters in physical systems, we have come up with a new algorithm that tracks the identity of clusters as the system evolves. According to identities assigned with the help of this algorithm, some clusters evolve, some others are born and some others die all the time. The lifetime of a cluster calculated in this way can be of a different order of magnitude than the lifetime of a bond - the building block of a cluster, comprising a pair of molecules. This algorithm can be used in the general case of evolution of particle clusters in various physical systems where the building block of clusters is again a bond between a pair of particles. As an example of an application of this algorithm, we analyzed the dynamics of icelike clusters in supercooled water and investigated if large clusters having icelike low density are also more stable in nature.

de, Subhranil; Debenedetti, Pablo

2004-03-01

113

Novel similarity-based clustering algorithm for grouping broadcast news

NASA Astrophysics Data System (ADS)

The goal of the current paper is to introduce a novel clustering algorithm that has been designed for grouping transcribed textual documents obtained out of audio, video segments. Since audio transcripts are normally highly erroneous documents, one of the major challenges at the text processing stage is to reduce the negative impacts of errors gained at the speech recognition stage. Other difficulties come from the nature of conversational speech. In the paper we describe the main difficulties of the spoken documents and suggest an approach restricting their negative effects. In our paper we also present a clustering algorithm that groups transcripts on the base of informative closeness of documents. To carry out such partitioning we give an intuitive definition of informative field of a transcript and use it in our algorithm. To assess informative closeness of the transcripts, we apply Chi-square similarity measure, which is also described in the paper. Our experiments with Chi-square similarity measure showed its robustness and high efficacy. In particular, the performance analysis that have been carried out in regard to Chi-square and three other similarity measures such as Cosine, Dice, and Jaccard showed that Chi-square is more robust to specific features of spoken documents.

Ibrahimov, Oktay V.; Sethi, Ishwar K.; Dimitrova, Nevenka

2002-03-01

114

A Simple Alternative to Jet-Clustering Algorithms

I describe a class of iterative jet algorithms that are based on maximizing a fixed function of the total 4-momentum rather than clustering of pairs of jets. I describe some of the properties of the simplest examples of this class, appropriate for jets at an $e^+e^-$ machine. These examples are sufficiently simple that many features of the jets that they define can be determined analytically with ease. The jets constructed in this way have some potentially useful properties, including a strong form of infrared safety.

Howard Georgi

2014-08-31

115

Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

Bezdek, James C.

1992-01-01

116

New Clustering Algorithm for Vector Quantization using Rotation of Error Vector

The paper presents new clustering algorithm. The proposed algorithm gives less distortion as compared to well known Linde Buzo Gray (LBG) algorithm and Kekre's Proportionate Error (KPE) Algorithm. Constant error is added every time to split the clusters in LBG, resulting in formation of cluster in one direction which is 1350 in 2-dimensional case. Because of this reason clustering is inefficient resulting in high MSE in LBG. To overcome this drawback of LBG proportionate error is added to change the cluster orientation in KPE. Though the cluster orientation in KPE is changed its variation is limited to +/- 450 over 1350. The proposed algorithm takes care of this problem by introducing new orientation every time to split the clusters. The proposed method reduces PSNR by 2db to 5db for codebook size 128 to 1024 with respect to LBG.

Kekre, H B

2010-01-01

117

Feature extraction and clustering for dynamic video summarisation

In this paper an effective dynamic video summarisation algorithm is presented using audio-visual features extracted from videos. Audio, colour and motion features are dynamically fused using an adaptively weighting mechanism. Dissimilarities of temporal video segments are formulated using the extracted features before these segments are clustered using a fuzzy c-means algorithm with an optimally determined cluster number. The experimental results

Huiyu Zhou; Abdul H. Sadka; Mohammad Rafiq Swash; Jawid Azizi; Umar A. Sadiq

2010-01-01

118

jClustering, an open framework for the development of 4D clustering algorithms.

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J

2013-01-01

119

jClustering, an Open Framework for the Development of 4D Clustering Algorithms

We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913

Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.

2013-01-01

120

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses

Zhexue Huang

1998-01-01

121

User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.

ERIC Educational Resources Information Center

Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…

Gordon, Michael D.

1991-01-01

122

Applying niching genetic algorithms for multiple cluster discovery in spatial analysis

Traditional genetic algorithms with elitist selection are unable to locate more than one solution in a multimodal fitness landscape in a single run. This genetic drift is illustrated in Mapex, a smart spatial analysis technique, employing a genetic algorithm for spatial cluster discovery. However, for detecting multiple clusters Mapex provides a non-ideal approach. In this paper, we use a fitness

Ritvik Sahajpal; G. V. Ramaraju; V. Bhatt

2004-01-01

123

A Clustering Algorithm Based on Cell Combination for Wireless Sensor Networks

Network scalability and energy efficiency in wireless sensor networks should be supported by elaborate topology control. Clustering algorithms are useful topology management approach to reduce communication overhead and exploit data aggregation in wireless sensor networks. A clustering algorithm based on cell combination is presented in the paper for the networks where sensor nodes are distributed densely and the energy of

Luo Chang-ri; Zhu Yun; Zhang Xin-hua; Zhou Zi-bo

2010-01-01

124

A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking

The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823

Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng

2014-01-01

125

An Energy Efficient Clustering Algorithm in Large-Scale Mobile Sensor Networks

Clustering offers a kind of hierarchical organization to provide scalability and basic performance guarantee by partitioning the network into disjoint groups of nodes. In this paper an energy efficient clustering algorithm is proposed under large-scale mobile sensor networks scenario. In the initial cluster formation phase, our proposed scheme features a simple execution process, which has a time and message complexity

Jianbo Li; Shan Jiang

2010-01-01

126

The watershed-clustering algorithm was adapted for use in multi-dimentional spectral space and was used to define clusters in Hyperspectral Digital Imagery Collection Experiment (HYDICE) data. This algorithm identifies clusters as peaks in a B-dimensional topographic relief, where B is the number of wavelength bands. Image pixel spectra are represented as points in this multi-dimensional space. Analysis is done at increasing

Gerard P. Jellison; Terrence H. Hemmer; Darryl G. Wilson

2002-01-01

127

In this paper, we present a novel multi-objective evolutionary clustering approach using variable-length real jumping genes genetic algorithms (VRJGGA). The proposed algorithm that extends jumping genes genetic algorithm (JGGA) [1] evolves clustering solutions using multiple clustering criteria, without a-priori knowledge of the actual number of clusters. Some local search methods such as probabilistic cluster merging and splitting are introduced in

Kazi Shah Nawaz Ripon; Chi-ho Tsang; Sam Kwong

2006-01-01

128

NASA Astrophysics Data System (ADS)

In this paper we construct an efficient adaptive Mahalanobis k-means algorithm. In addition, we propose a new efficient algorithm to search for a globally optimal partition obtained by using the adoptive Mahalanobis distance-like function. The algorithm is a generalization of the previously proposed incremental algorithm (Scitovski and Scitovski, 2013). It successively finds optimal partitions with k = 2 , 3 , … clusters. Therefore, it can also be used for the estimation of the most appropriate number of clusters in a partition by using various validity indexes. The algorithm has been applied to the seismic catalogues of Croatia and the Iberian Peninsula. Both regions are characterized by a moderate seismic activity. One of the main advantages of the algorithm is its ability to discover not only circular but also elliptical shapes, whose geometry fits the faults better. Three seismogenic zonings are proposed for Croatia and two for the Iberian Peninsula and adjacent areas, according to the clusters discovered by the algorithm.

Morales-Esteban, Antonio; Martínez-Álvarez, Francisco; Scitovski, Sanja; Scitovski, Rudolf

2014-12-01

129

Improving the Initial Centroids of k-means Clustering Algorithm to Generalize its Applicability

NASA Astrophysics Data System (ADS)

k-means is one of the most widely used partition based clustering algorithm. But the initial centroids generated randomly by the k-means algorithm cause the algorithm to converge at the local optimum. So to make k-means algorithm globally optimum, the initial centroids must be selected carefully rather than randomly. Though many researchers have already been carried out for the enhancement of k-means algorithm, they have their own limitations. In this paper a new method to formulate the initial centroids is proposed which results in better clusters equally for uniform and non-uniform data sets.

Goyal, M.; Kumar, S.

2014-12-01

130

Clustering Analysis for fMRI Dataset based on ISODATA Algorithm

In the paper, the modified fuzzy c-means (MFc) is firstly used to treat the ill-balanced fMRI dataset to improve the efficiency, remove the redundance and reduce the population of analyzed voxels. Then the iteration self-organization data analysis techniques algorithm (ISODATA) method, as the development of data-driving methods, is utilized to find out the activated region in the brain. Therefore a

Xi Zheng; Zhitong Cao; Bo Shao; Jiazhong Fang; Guoguang He

2005-01-01

131

Two generalizations of Kohonen clustering

NASA Technical Reports Server (NTRS)

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.

Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

1993-01-01

132

An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862

Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

2014-01-01

133

C-element: A New Clustering Algorithm to Find High Quality Functional Modules in PPI Networks

Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752

Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali

2013-01-01

134

A highly efficient multi-core algorithm for clustering extremely large datasets

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922

2010-01-01

135

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

Deb, Suash; Yang, Xin-She

2014-01-01

136

A heart disease recognition embedded system with fuzzy cluster algorithm.

This article presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram - ECG signal processing by reducing the amount of data samples, without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicated common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database (EDB) as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan(®)-3A FPGA. The field programmable gate array (FPGA) implemented a Xilinx Microblaze(®) Soft-Core Processor running at a 50MHz clock rate. PMID:23394802

de Carvalho, Helton Hugo; Moreno, Robson Luiz; Pimenta, Tales Cleber; Crepaldi, Paulo C; Cintra, Evaldo

2013-06-01

137

NASA Technical Reports Server (NTRS)

We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.

Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William

2006-01-01

138

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

139

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

140

The Effect of Clustering Algorithms on Aftershock Productivity and Foreshock Rates

NASA Astrophysics Data System (ADS)

The properties of earthquake clusters are important for the modeling of short-term hazard. In particular the forecasting of larger events is of societal importance. We apply common declustering algorithms, including Reasenberg, Gardner-Knophoff, and the model independent method by Marsan to the Southern California earthquake data to define earthquake clusters. We model the aftershock productivity as a function of mainshock magnitude M for the different clustering algorithms by Nave = 10?(M-M1), where ? is the growth parameter and M1 corresponds to the magnitude that on average has one aftershock above the completeness magnitude. The number of aftershocks, hereafter called abundance, depends on the area and time in which to count aftershocks as well as the completeness magnitude. Spatial and temporal extent of aftershock sequences can vary significantly with clustering algorithm. By combining the abundance model with the Gutenberg-Richter equation for the distribution of earthquake magnitude, we can predict foreshock rates and compare them to observations. Depending on the clustering algorithm, foreshock rates can vary up to a factor of two. For some clustering algorithms, the foreshock rate is magnitude dependent, while for other algorithms it is not. However, we find that the foreshock rates predicted from aftershock abundance agree well with the observation for any consistent way of defining fore-and aftershocks. This confirms the common assumption that foreshocks trigger mainshocks in the same manner that mainshocks trigger aftershocks. Our results show that properties of earthquake sequences vary with clustering algorithm. Thus interpretations of fore-or aftershock characteristics need to give careful consideration to the clustering algorithm and data selection.

Christophersen, A.; Wiemer, S.; Smith, E. G.

2007-12-01

141

NASA Astrophysics Data System (ADS)

The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.

Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen

2014-08-01

142

Motivation: Clustering protein sequence data into functionally specific families is a difficult but important problem in biological research. One useful approach for tackling this problem involves representing the sequence dataset as a protein similarity network, and afterwards clustering the network using advanced graph analysis techniques. Although a multitude of such network clustering algorithms have been developed over the past few years, comparing algorithms is often difficult because performance is affected by the specifics of network construction. We investigate an important aspect of network construction used in analyzing protein superfamilies and present a heuristic approach for improving the performance of several algorithms. Results: We analyzed how the performance of network clustering algorithms relates to thresholding the network prior to clustering. Our results, over four different datasets, show how for each input dataset there exists an optimal threshold range over which an algorithm generates its most accurate clustering output. Our results further show how the optimal threshold range correlates with the shape of the edge weight distribution for the input similarity network. We used this correlation to develop an automated threshold selection heuristic in order to most optimally filter a similarity network prior to clustering. This heuristic allows researchers to process their protein datasets with runtime efficient network clustering algorithms without sacrificing the clustering accuracy of the final results. Availability: Python code for implementing the automated threshold selection heuristic, together with the datasets used in our analysis, are available at http://www.rbvi.ucsf.edu/Research/cytoscape/threshold_scripts.zip. Contact: tef@cgl.ucsf.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21118823

Apeltsin, Leonard; Morris, John H.; Babbitt, Patricia C.; Ferrin, Thomas E.

2011-01-01

143

An Energy-Efficient Clustering Algorithm for Multihop Data Gathering in Wireless Sensor

An Energy-Efficient Clustering Algorithm for Multihop Data Gathering in Wireless Sensor Networks1 Algorithm for Optimized Data Dissemination in Wireless Sensor Networks", by S. Selvakennedy and S. Sinnappan which appeared in The IEEE Conference on Local Computer Networks, 2005. Â© 2005 IEEE. Abstract--Wireless

Selvadurai, Selvakennedy

144

We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation. At this,

Chris Biemann

2006-01-01

145

of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniquesCommodity cluster and hardware-based massively parallel implementations of hyperspectral imaging of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data

Plaza, Antonio J.

146

A vector reconstruction based clustering algorithm particularly for large-scale text collection.

Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. PMID:25539500

Liu, Ming; Wu, Chong; Chen, Lei

2015-03-01

147

An improved clustering algorithm of tunnel monitoring data for cloud computing.

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

148

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525

Ju, Chunhua

2013-01-01

149

An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

150

A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

151

A fast density-based clustering algorithm for real-time Internet of Things stream.

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

152

Transaction clustering of web log data files using genetic algorithm

Increasingly web applications found to impact on numerous environments. The web log data offer more promises and particularly application of the genetic algorithms is significant as it represents the relations between different data components. We have used simple genetic algorithms to log files and we found that the preliminary results are more promising there by open more avenues for future

Daisy Jacobs; S. Sarasvady; Pit. Pichappan

2007-01-01

153

A Community Detection Algorithm Based on Topology Potential and Spectral Clustering

Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846

Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda

2014-01-01

154

NASA Astrophysics Data System (ADS)

We present a comparison of three cluster-finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg2 region of Sloan Digital Sky Survey (SDSS) imaging data: the matched filter (MF; Postman et al., published in 1996), the adaptive matched filter (AMF; Kepner et al., published in 1999), and a color-magnitude filtered Voronoi tessellation technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a nonuniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for each method evaluated with these thresholds as a function of redshift and richness. For simulated clusters generated with a Schechter luminosity function (M*r=-21.5 and ?=-1.1), both algorithms are complete for Abell richness >~1 clusters up to z~0.4 for a sample magnitude limited to r=21. While the cluster parameter evaluation shows a mild correlation with the local background density, the detection efficiency is not significantly affected by the background fluctuations, unlike previous shallower surveys.

Kim, Rita Seung Jung; Kepner, Jeremy V.; Postman, Marc; Strauss, Michael A.; Bahcall, Neta A.; Gunn, James E.; Lupton, Robert H.; Annis, James; Nichol, Robert C.; Castander, Francisco J.; Brinkmann, J.; Brunner, Robert J.; Connolly, Andrew; Csabai, Istvan; Hindsley, Robert B.; Ivezi?, Željko; Vogeley, Michael S.; York, Donald G.

2002-01-01

155

We present a comparison of three cluster finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg^2 region of Sloan Digital Sky Survey (SDSS) imaging data: the Matched Filter (MF; Postman et al. 1996), the Adaptive Matched Filter (AMF; Kepner et al. 1999) and a color-magnitude filtered Voronoi Tessellation Technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a non-uniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for each method evaluated with these thresholds as a function of redshift and richness. For simulated clusters generated with a Schechter luminosity function (M_r^* = -21.5 and alpha = -1.1) both algorithms are complete for Abell richness >= 1 clusters up to z ~ 0.4 for a sample magnitude limited to r = 21. While the cluster parameter evaluation shows a mild correlation with the local background density, the detection efficiency is not significantly affected by the background fluctuations, unlike previous shallower surveys.

Rita S. J. Kim; Jeremy V. Kepner; Marc Postman; Michael A. Strauss; Neta A. Bahcall; James E. Gunn; Robert H. Lupton; James Annis; Robert C. Nichol; Francisco J. Castander; J. Brinkmann; Robert J. Brunner; Andrew Connolly; Istvan Csabai; Robert B. Hindsley; Zeljko Ivezic; Michael S. Vogeley; Donald G. York

2001-10-10

156

Simulation of DNA damage clustering after proton irradiation using an adapted DBSCAN algorithm.

In this work the "Density Based Spatial Clustering of Applications with Noise" (DBSCAN) algorithm was adapted to early stage DNA damage clustering calculations. The resulting algorithm takes into account the distribution of energy deposit induced by ionising particles and a damage probability function that depends on the total energy deposit amount. Proton track simulations were carried out in small micrometric volumes representing small DNA containments. The algorithm was used to determine the damage concentration clusters and thus to deduce the DSB/SSB ratios created by protons between 500keV and 50MeV. The obtained results are compared to other calculations and to available experimental data of fibroblast and plasmid cells irradiations, both extracted from literature. PMID:21232812

Francis, Ziad; Villagrasa, Carmen; Clairand, Isabelle

2011-03-01

157

Approximation Algorithms for the Mobile Piercing Set Problem with Applications to Clustering in Ad-factor approximation algorithms for the mobile piercing set (MPS) problem on unit-disks for standard metrics in #12;xed factors of the respective centralized algorithms: Our algorithms take O(1) time to update the piercing set

Richa, Andrea Werneck

158

Performance analysis of clustered-OFDM system with bitloading algorithm for broadband PLC

This contribution presents performance analysis of clustered-OFDM system applied to power line communication (PLC) applications when bitloading algorithm is taken into account. The attained results verify that, in comparison with the standard OFDM system, the use of clustered-OFDM for downlink data transmission in a shared outdoor and low-voltage power line cables among several users, can provide improved performance. Computational results

F. P. V. de Campos; M. V. Ribeiro

2008-01-01

159

Contention-free Complete Exchange Algorithm on Clusters

To construct a large commodity cluster, a hierarchical network is generally adopted for connecting the host ma- chines, where a Gigabit backbone switch connects a few commodity switches with uplinks to achieve scaled bisec- tional bandwidth. This type of interconnection usually re- sults in link contention and has congestion developed at the uplink ports. Moreover, the non-deterministic delays on scheduling

Anthony T. C. Tam; Cho-li Wang

2000-01-01

160

The Gaussian Mixture MCMC Particle Algorithm for Dynamic Cluster Tracking

behavioral moves such as birth and death of clusters as well as merging and split- ting. Following our approaches are now preferred in many cases. Owing to the complex nature of MTT problems, sta- tistical to the mathematical modeling of complex interactions be- tween entities. This consists mainly of birth and death

Boyer, Edmond

161

The Gaussian Mixture MCMC Particle Algorithm for Dynamic Cluster Tracking

behavioral moves such as birth and death of clusters as well as merging and splitting. For handling complex classes: non-statistical and statistical. Non-statistical methods typically rely on both image interactions between entities. This consists mainly of birth and death of targets as well as coordinated

Paris-Sud XI, UniversitÃ© de

162

An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm

This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data. PMID:24511336

Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng

2013-01-01

163

Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905

Lee, Chongdeuk; Jeong, Taegwon

2011-01-01

164

Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905

Lee, Chongdeuk; Jeong, Taegwon

2011-01-01

165

E-commerce recommendation system is one of the most important and the most successful application field of data mining technology. Recommendation algorithm is the core of the recommendation system. In this paper, a neural networks-based clustering collaborative filtering algorithm in e-commerce recommendation system is designed, trying to establish an classifier model based on BP neural network for the pre-classification to items

Jianying Mai; Yongjian Fan; Yanguang Shen

2009-01-01

166

An automatic rule base generation method for fuzzy pattern recognition with multiphased clustering

Presents an approach for the automatic generation of fuzzy rule bases for pattern recognition from a given sample data. The general idea of the approach is to use and enhance the fuzzy c-means clustering algorithm. The rule base is generated through a modified iterative feature clustering method. A following cross-checking is used to separate the generated rules. Although the rule

Franjo Ivancic; Ashutosh Malaviya; Liliane Peters

1998-01-01

167

ELKI: A Software System for Evaluation of Subspace Clustering Algorithms

" open source machine learning framework Weka [1]. We consider Weka as the most prominent and popular environment for data mining algorithms. However, the focus and strength of Weka is mainly located in the area Weka. The main focus of YALE is in supporting "rapid prototyping", i.e. to ease the definition

Kriegel, Hans-Peter

168

A cluster finding algorithm based on the multiband identification of red sequence galaxies

NASA Astrophysics Data System (ADS)

We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ˜11 960 deg2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71 743 clusters in the redshift range 0.1 < z < 0.6 with richness after correcting for the incompleteness of the richness estimate greater than 20. We cross-match the cluster catalogue with external cluster catalogues to find that our photometric cluster redshift estimates are accurate with low bias and scatter, and that the corrected richness correlates well with X-ray luminosities and temperatures. We use the publicly available Canada-France-Hawaii Telescope Lensing Survey shear catalogue to calibrate the mass-richness relation from stacked weak lensing analysis. Stacked weak lensing signals are detected significantly for eight subsamples of the SDSS clusters divided by redshift and richness bins, which are then compared with model predictions including miscentring effects to constrain mean halo masses of individual bins. We find the richness correlates well with the halo mass, such that the corrected richness limit of 20 corresponds to the cluster virial mass limit of about 1 × 1014 h-1 M? for the SDSS DR8 cluster sample.

Oguri, Masamune

2014-10-01

169

Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state

We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.

Vallone, Giuseppe [Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, Via Panisperna 89/A, Compendio del Viminale, IT-00184 Roma (Italy); Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Donati, Gaia; Bruno, Natalia; Chiuri, Andrea [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Mataloni, Paolo [Dipartimento di Fisica, Universita Sapienza di Roma, IT-00185 Roma (Italy); Istituto Nazionale di Ottica (INO-CNR), L.go E. Fermi 6, IT-50125 Florence (Italy)

2010-05-15

170

A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation

-bound processing. A segmented image (in which the multispectral data in each pixel is classi ed into one of a smallA contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image USA ABSTRACT The recent and continuing construction of multi- and hyper-spectral imagers will provide

Theiler, James

171

A Cluster Warhead Projection-Time Self-Adaptive Algorithm and Simulation

This paper proposes a self-adaptive algorithm for the projection-time of the cluster warhead, which is applicable to the realization of the engineering project with low cost. On the basis of keeping the simplified correction strategy that keeps the constant distance of the projection points and through detecting the deviation of the actual trajectory data at a specific time with respect

Qiang Shen; Mian Ge; Jie Li

2009-01-01

172

Preserving Class Discriminatory Information by Context-sensitive Intra-class Clustering Algorithm

Preserving Class Discriminatory Information by Context-sensitive Intra-class Clustering Algorithm classifier) assume that data in each class have a single Gaussian distribution. In reality, data in the class of interest, i.e., the object class, could have non-Gaussian dis- tributions and could be isolated

Choe, Yoonsuck

173

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F produces accurate segmentation results only when applied to images defined by homogenous regions segmentation scheme has been applied to a large number of natural images and the experimental data indicates

Whelan, Paul F.

174

Parallel Particle Swarm Optimization Clustering Algorithm based on MapReduce Methodology

areas such as bioinformatics and social networking are in urgent need of scalable approaches. The new nowadays. Examples are the clustering of profile pages in social networks, bioinformatics applications big data into consideration, the algorithm should be efficient, scalable and obtain high quality

Ludwig, Simone

175

When using the penetration cluster warhead attacking runways, the selection of aim-points influents attack effect directly. By way of the analyzing the spread characters of penetration cluster warhead and runway' characteristics, this paper builds the model of selection aim-point using penetration cluster warhead based on genetic algorithms (GA) and Monte Carlo (MC), and simulates the model. The simulation result indicates

Wang Jian; Wang Minghai

2010-01-01

176

The construction of stable and adaptive clusters providing good performance and faster convergence rate with minimal overhead is a challenging task in Mobile Ad hoc Networks (MANETs). This paper proposes a clustering technique for MANETs, which is distributed, dominating set based, weighted and adaptive to changes in the topology called Distributed Scenario-based Clustering Algorithm for Mobile ad hoc networks (DSCAM).

V. S. Anitha; M. P. Sebastian

2009-01-01

177

A New Monte Carlo Method and Its Implications for Generalized Cluster Algorithms

We describe a novel switching algorithm based on a ``reverse'' Monte Carlo method, in which the potential is stochastically modified before the system configuration is moved. This new algorithm facilitates a generalized formulation of cluster-type Monte Carlo methods, and the generalization makes it possible to derive cluster algorithms for systems with both discrete and continuous degrees of freedom. The roughening transition in the sine-Gordon model has been studied with this method, and high-accuracy simulations for system sizes up to $1024^2$ were carried out to examine the logarithmic divergence of the surface roughness above the transition temperature, revealing clear evidence for universal scaling of the Kosterlitz-Thouless type.

C. H. Mak; Arun K. Sharma

2007-04-12

178

A comparison of algorithms for the construction of SZ cluster catalogues

NASA Astrophysics Data System (ADS)

We evaluate the construction methodology of an all-sky catalogue of galaxy clusters detected through the Sunyaev-Zel'dovich (SZ) effect. We perform an extensive comparison of twelve algorithms applied to the same detailed simulations of the millimeter and submillimeter sky based on a Planck-like case. We present the results of this "SZ Challenge" in terms of catalogue completeness, purity, astrometric and photometric reconstruction. Our results provide a comparison of a representative sample of SZ detection algorithms and highlight important issues in their application. In our study case, we show that the exact expected number of clusters remains uncertain (about a thousand cluster candidates at |b| > 20 deg with 90% purity) and that it depends on the SZ model and on the detailed sky simulations, and on algorithmic implementation of the detection methods. We also estimate the astrometric precision of the cluster candidates which is found of the order of ~2 arcmin on average, and the photometric uncertainty of about 30%, depending on flux.

Melin, J.-B.; Aghanim, N.; Bartelmann, M.; Bartlett, J. G.; Betoule, M.; Bobin, J.; Carvalho, P.; Chon, G.; Delabrouille, J.; Diego, J. M.; Harrison, D. L.; Herranz, D.; Hobson, M.; Kneissl, R.; Lasenby, A. N.; Le Jeune, M.; Lopez-Caniego, M.; Mazzotta, P.; Rocha, G. M.; Schaefer, B. M.; Starck, J.-L.; Waizmann, J. C.; Yvon, D.

2012-12-01

179

The fuzzy clustering analysis based on AFS theory.

In the framework of axiomatic fuzzy sets theory, we first study how to impersonally and automatically determine the membership functions for fuzzy sets according to original data and facts, and a new algorithmic framework of determining membership functions and their logic operations for fuzzy sets has been proposed. Then, we apply the proposed algorithmic framework to give a new clustering algorithm and show that the algorithm is feasible. A number of illustrative examples show that this approach offers a far more flexible and effective means for the intelligent systems in real-world applications. Compared with popular fuzzy clustering algorithms, such as c-means fuzzy algorithm and k-nearest-neighbor fuzzy algorithm, the new fuzzy clustering algorithm is more simple and understandable, the data types of the attributes can be various data types or subpreference relations, even descriptions of human intuition, and the distance function and the class number need not be given beforehand. PMID:16240775

Liu, Xiaodong; Wang, Wei; Chai, Tianyou

2005-10-01

180

A genetic algorithmic approach to antenna null-steering using a cluster computer.

NASA Astrophysics Data System (ADS)

We apply a genetic algorithm (GA) to the problem of electronically steering the maximums and nulls of an antenna array to desired positions (null toward enemy listener/jammer, max toward friendly listener/transmitter). The antenna pattern itself is computed using NEC2 which is called by the main GA program. Since a GA naturally lends itself to parallelization, this simulation was applied to our new twin 64-node cluster computers (Gemini). Design issues and uses of the Gemini cluster in our group are also discussed.

Recine, Greg; Cui, Hong-Liang

2001-06-01

181

Collaborative Fuzzy Clustering From Multiple Weighted Views.

Clustering with multiview data is becoming a hot topic in data mining, pattern recognition, and machine learning. In order to realize an effective multiview clustering, two issues must be addressed, namely, how to combine the clustering result from each view and how to identify the importance of each view. In this paper, based on a newly proposed objective function which explicitly incorporates two penalty terms, a basic multiview fuzzy clustering algorithm, called collaborative fuzzy c-means (Co-FCM), is firstly proposed. It is then extended into its weighted view version, called weighted view collaborative fuzzy c-means (WV-Co-FCM), by identifying the importance of each view. The WV-Co-FCM algorithm indeed tackles the above two issues simultaneously. Its relationship with the latest multiview fuzzy clustering algorithm Collaborative Fuzzy K-Means (Co-FKM) is also revealed. Extensive experimental results on various multiview datasets indicate that the proposed WV-Co-FCM algorithm outperforms or is at least comparable to the existing state-of-the-art multitask and multiview clustering algorithms and the importance of different views of the datasets can be effectively identified. PMID:25069132

Jiang, Yizhang; Chung, Fu-Lai; Wang, Shitong; Deng, Zhaohong; Wang, Jun; Qian, Pengjiang

2014-07-23

182

TermitAnt: An Ant Clustering Algorithm Improved by Ideas from Termite Colonies

\\u000a This paper proposes a heuristic to improve the convergence speed of the standard ant clustering algorithm. The heuristic is\\u000a based on the behavior of termites that, when building their nests, add some pheromone to the objects they carry. In this context,\\u000a pheromone allows artificial ants to get more information, at the local level, about the work in progress at the

Vahid Sherafat; Leandro Nunes De Castro; Eduardo R. Hruschka

2004-01-01

183

CTSC: Core-Tag Oriented Spectral Clustering Algorithm on Web2.0 Tags

With the rapid development of the Web2.0 communities, many researchers have been attracted by the concept of folksonomy from the field of data mining and information retrieval. Finding out semantic correlation of tags is avid requirement for Web2.0 application. However, no proper algorithm can tackle this task very well. This paper proposes a core-tag oriented clustering method to handle the

Yexi Jiang; Changjie Tang; Kaikuo Xu; Yu Chen; Jie Gong; Liang Tang

2009-01-01

184

A Hierarchical Cluster Algorithm for Dynamic, Centralized Timestamps Paul A.S. Ward and David J. Taylor Shoshin Distributed Systems Group Department of Computer Science University of Waterloo fpasward

Ward, Paul A.S.

185

NASA Astrophysics Data System (ADS)

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

186

Synthetic yet realistic images are valuable for many applications in visual sciences and medical imaging. Typically, investigators develop algorithms and adjust their parameters to generate images that are visually similar to real images. In this study, we used a genetic algorithm and an objective, statistical similarity measure to optimize a particular texture generation algorithm, the clustered lumpy backgrounds (CLB) technique, and synthesize images mimicking real mammograms textures. We combined this approach with psychophysical experiments involving the judgment of radiologists, who were asked to qualify the visual realism of the images. Both objective and psychophysical approaches show that the optimized versions are significantly more realistic than the previous CLB model. Anatomical structures are well reproduced, and arbitrary large databases of mammographic texture with visual and statistical realism can be generated. Potential applications include detection experiments, where large amounts of statistically traceable yet realistic images are needed. PMID:18545466

Castella, Cyril; Kinkel, Karen; Descombes, François; Eckstein, Miguel P; Sottas, Pierre-Edouard; Verdun, Francis R; Bochud, François O

2008-05-26

187

A heuristic method for finding the optimal number of clusters with application in medical data.

In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clustering algorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned indices in some artificial and medical datasets. PMID:19163761

Bayati, Hamidreza; Davoudi, Heydar; Fatemizadeh, Emad

2008-01-01

188

A new concept of wildland-urban interface based on city clustering algorithm

NASA Astrophysics Data System (ADS)

Wildland-Urban-Interface (WUI) is a widely used term in the context of wild and forest fires to indicate areas where human infrastructures interact with wildland/forest areas. Many complex problems are associated to the WUI; but the most relevant ones are those related to forest fire hazard and management in dense populated areas where fire regime is dominated by anthropogenic-induced ignition fires. This coexistence enhances both anthropogenic-ignition sources and flammable fuels. Furthermore, the growing trend of the WUI and global change effects may even worsening the situation in the near future. Therefore, many studies are dedicated to the WUI problem, focusing on refinement of its definition, development of mapping methods, implementation of measures into specific fire management plans and the validation of the proposed approaches. The present study introduces a new concept of WUI based on city clustering algorithm (CCA) introduced in Rosenfeld et al., 2008. CCA was proposed as an automatic tool for studying the definition of cities and their distribution. The algorithm uses demographic data - either on a regular or non-regular grid in space - where a city (urban zone) is detected as a cluster of connected populated cells with maximal size. In the present study the CCA is proposed as a tool to develop a new concept of population dynamic analysis crucial to define and to localise WUI. The real case study is based on demographic/census data - organised in a regular grid with a resolution of 100 m and the forest fire ignition points database from canton Ticino, Switzerland. By changing spatial scales of demographic cells the relationships between urban zones (demographic clusters) and forest fire events were statistically analyzed. Corresponding scaling laws were used to understand the interaction between urban zones and forest fires. The first results are good and indicate that the method can be applied to define WUI in an innovative way. Keywords: forest fires, wild-land-user interface, city clustering algorithms.

Kanevski, M.; Champendal, A.; Vega Orozco, C.; Tonini, M.; Conedera, M.

2012-04-01

189

Possibilistic clustering for shape recognition

NASA Technical Reports Server (NTRS)

Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.

Keller, James M.; Krishnapuram, Raghu

1993-01-01

190

Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm

NASA Technical Reports Server (NTRS)

Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.

Mitra, Sunanda; Pemmaraju, Surya

1992-01-01

191

Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs. PMID:24629125

2014-01-01

192

An improved scheduling algorithm for 3D cluster rendering with platform LSF

NASA Astrophysics Data System (ADS)

High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.

Xu, Wenli; Zhu, Yi; Zhang, Liping

2013-10-01

193

a Multi-Core Fpga-Based 2D-CLUSTERING Algorithm for High-Throughput Data Intensive Applications

NASA Astrophysics Data System (ADS)

A multi-core FPGA-based clustering algorithm for high-throughput data intensive applications is presented. The algorithm is optimized for data with two dimensional organization (e.g. image processing, pixel detectors for high energy physics experiments etc.). It uses a moving window of generic size to adjust to the application's processing requirements (the cluster sizes and shapes that appear in the input data sets). One or more windows (cores) can be used to identify clusters in parallel, allowing for versatility to increase performance or reduce the amount of used resources. In addition to the inherent parallelism the algorithm is executed in a pipeline, thus allowing for readout to be performed in parallel with the cluster identification.

Sotiropoulou, Calliope-Louisa; Nikolaidis, Spyridon; Annovi, Alberto; Beretta, Matteo; Volpi, Guido; Giannetti, Paola; Luciano, Pierluigi

2014-06-01

194

NASA Astrophysics Data System (ADS)

Group Technology is a method of increasing the productivity for manufacturing high quality products and improving the flexibility of manufacturing systems in the many variety and small batch production. The parts with the same or similar process route are gathered into a group through production flow analysis. Each part processing route was analyzed. The production flow analysis figure of the parts was drawn. The fuzzy clustering algorithm was used to classify the machine tools and the parts. The fuzzy similar matrices and the relay closure matrices were obtained based on MATLAB. Then the corresponding relations between each group of the parts and each group of the machine tools were found. The similar parts were gathered and the workshop machine layout was reconstructed. The steps of production flow analysis and the fuzzy clustering analysis were introduced. The effectiveness of the method was proved by an example.

Du, Yanwei

2011-12-01

195

NASA Astrophysics Data System (ADS)

Group Technology is a method of increasing the productivity for manufacturing high quality products and improving the flexibility of manufacturing systems in the many variety and small batch production. The parts with the same or similar process route are gathered into a group through production flow analysis. Each part processing route was analyzed. The production flow analysis figure of the parts was drawn. The fuzzy clustering algorithm was used to classify the machine tools and the parts. The fuzzy similar matrices and the relay closure matrices were obtained based on MATLAB. Then the corresponding relations between each group of the parts and each group of the machine tools were found. The similar parts were gathered and the workshop machine layout was reconstructed. The steps of production flow analysis and the fuzzy clustering analysis were introduced. The effectiveness of the method was proved by an example.

Du, Yanwei

2012-01-01

196

Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix

NASA Technical Reports Server (NTRS)

Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.

Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

2006-01-01

197

NASA Astrophysics Data System (ADS)

It is shown that the non-terminating expansions of the wave function within the variational coupled cluster singles (VCCS) can be exactly treated by summing up the one-particle density matrix elements in the occupied block using simple recurrence relation. At the same time, this leads to an extremely simple 'a priori' diagonalization free algorithm for the solution of the Hartree-Fock equations. This treatment corresponds to a non-unitary transformation of orbitals, however, preserving the norm and idempotency of the density matrix. The resulting algorithm enables a Hartree-Fock solution with 'a priori' localized orbitals. Similar approach can be applied within the Kohn-Sham theory. Analysis of the VCCS expansion in terms of the generalized perturbation theory is also presented. Numerical results are presented for model systems N2, F2, H2O, NH3 but also for a larger Uracile molecule and an interaction of four Guanine molecules.

Šimunek, Ján; Noga, Jozef

2012-12-01

198

Cloud classification from satellite data using a fuzzy sets algorithm: A polar example

NASA Technical Reports Server (NTRS)

Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

Key, J. R.; Maslanik, J. A.; Barry, R. G.

1988-01-01

199

Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm

NASA Astrophysics Data System (ADS)

Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.

Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel

2014-05-01

200

Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492

Fong, Simon

2012-01-01

201

A computational algorithm for functional clustering of proteome dynamics during development.

Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031

Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling

2014-06-01

202

A Computational Algorithm for Functional Clustering of Proteome Dynamics During Development

Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031

Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling

2014-01-01

203

KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.

KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505

Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C

2014-06-01

204

A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation

The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.

Theiler, J.; Gisler, G.

1997-07-01

205

`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny

NASA Astrophysics Data System (ADS)

Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.

Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

2010-10-01

206

NASA Astrophysics Data System (ADS)

This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, ‘Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.

Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

2015-01-01

207

We present a new algorithm to search for distant clusters of galaxies on catalogues deriving from imaging data, as those of the ESO Imaging Survey. Our algorithm is a matched filter one, similar to that adopted by Postman et al. (1996), aiming at identifying cluster candidates by using positional and photometric data simultaneously. The main novelty of our approach is that spatial and luminosity filter are run separately on the catalogue and no assumption is made on the typical size nor on the typical M* for clusters, as these parameters intervene in our algorithm as typical angular scale (sigma) and typical apparent magnitude m*. Moreover we estimate the background locally for each candidate, allowing us to overcome the hazards of inhomogeneous datasets. As a consequence our algorithm has a lower contamination rate - without loss of completeness - in comparison to other techniques, as tested through extensive simulations. We provide catalogues of galaxy cluster candidates as the result of applying our algorithm to the I-band data of the EIS-wide patches A and B.

C. Lobo; A. Iovino; D. Lazzati; G. Chincarini

2000-06-29

208

NASA Astrophysics Data System (ADS)

Land use/cover (LUC) classification plays an important role in remote sensing and land change science. Because of the complexity of ground covers, LUC classification is still regarded as a difficult task. This study proposed a fusion algorithm, which uses support vector machines (SVM) and fuzzy k-means (FKM) clustering algorithms. The main scheme was divided into two steps. First, a clustering map was obtained from the original remote sensing image using FKM; simultaneously, a normalized difference vegetation index layer was extracted from the original image. Then, the classification map was generated by using an SVM classifier. Three different classification algorithms were compared, tested, and verified-parametric (maximum likelihood), nonparametric (SVM), and hybrid (unsupervised-supervised, fusion of SVM and FKM) classifiers, respectively. The proposed algorithm obtained the highest overall accuracy in our experiments.

He, Tao; Sun, Yu-Jun; Xu, Ji-De; Wang, Xue-Jun; Hu, Chang-Ru

2014-01-01

209

Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723

Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

2014-01-01

210

Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough. PMID:25204134

Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing

2014-02-01

211

Modi ed Fuzzy C-Mean in Medical Image Segmentation

research requires quantitative information, such as the size of the brain ventricles after a traumatic brain injury or the rela- tive volume of ventricles to brain. It is important to have a faithful tool of brain images. We propose a fully automatic technique to obtain image clusters. A modi ed fuzzy c

Louisville, University of

212

Modi ed Fuzzy C-Mean in Medical Image Segmentation

after a traumatic brain injury or the rela- tive volume of ventricles to brain. It is important to have- tion of brain images. We propose a fully automatic technique to obtain image clusters. A modi ed fuzzy by radiologists. Advanced research requires quantitative information, such as the size of the brain ventricles

Farag, Aly A.

213

NASA Astrophysics Data System (ADS)

Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.

Bagheripour, Parisa; Asoodeh, Mojtaba

2013-12-01

214

NASA Technical Reports Server (NTRS)

An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.

Dasarathy, B. V.

1976-01-01

215

NASA Astrophysics Data System (ADS)

The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of GanAs m clusters will be presented as an illustration of the method.

Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude

2015-01-01

216

NASA Technical Reports Server (NTRS)

Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.

Dasarathy, B. V.

1976-01-01

217

NASA Astrophysics Data System (ADS)

Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.

Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.

2014-03-01

218

Possibilistic clustering for shape recognition

NASA Technical Reports Server (NTRS)

Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

Keller, James M.; Krishnapuram, Raghu

1992-01-01

219

This work presents a detailed framework to detect the location of heart sound within the respiratory sound based on temporal fuzzy c-means (TFCM) algorithm. In the proposed method, respiratory sound is first divided into frames and for each frame, the logarithmic energy features are calculated. Then, these features are used to classify the respiratory sound as heart sound (HS containing lung sound) and non-HS (only lung sound) by the TFCM algorithm. The TFCM is the modified version fuzzy c-means (FCM) algorithm. While the FCM algorithm uses only the local information about the current frame, the TFCM algorithm uses the temporal information from both the current and the neighboring frames in decision making. To measure the detection performance of the proposed method, several experiments have been conducted on a database of 24 healthy subjects. The experimental results show that the average false-negative rate values are 0.8 ± 1.1 and 1.5 ± 1.4 %, and the normalized area under detection error curves are [Formula: see text] and [Formula: see text] for the TFCM method in the low and medium respiratory flow rates, respectively. These average values are significantly lower than those obtained by FCM algorithm and by the other compared methods in the literature, which demonstrates the efficiency of the proposed TFCM algorithm. On the other hand, the average elapsed time of the TFCM for a data with length of [Formula: see text] s is 0.2 ± 0.05 s, which is slightly higher than that of the FCM and lower than those of the other compared methods. PMID:25326867

Shamsi, Hamed; Ozbek, I Yucel

2015-01-01

220

Introducing genetic algorithms as a reliable and efficient tool to find ordered equilibrium structures, we predict minimum energy configurations of the square shoulder system for different values of corona width $\\lambda$. Varying systematically the pressure for different values of $\\lambda$ we obtain complete sequences of minimum energy configurations which provide a deeper understanding of the system's strategies to arrange particles in an energetically optimized fashion, leading to the competing self-assembly scenarios of cluster-formation vs. lane-formation.

Julia Fornleitner; Gerhard Kahl

2007-09-03

221

Performance evaluation of Allgather algorithms on terascale Linux cluster with fast Ethernet

We report our work on evaluating performance of several MPI Allgather algorithms on fast Ethernet. These algorithms are ring, recursive doubling, Bruck, and neighbor exchange. The first three algorithms are widely used today. The neighbor exchange algorithm which was proposed by the authors incorporates pair-wise exchange, and is expected to perform better with certain configurations, mainly when using TCP\\/IP over

Jing Chen; Linbo Zhang; Yunquan Zhang; Wei Yuan

2005-01-01

222

San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

Li, Weizhong [San Diego Supercomputer Center] [San Diego Supercomputer Center

2011-10-12

223

MR image denoising using nonlinear regression and Fuzzy C-Means clustering

Magnetic Resonance (MR) imaging is useful for medical diagnosis. However, MR images are often corrupted by Rician noise, leading to undesirable visual quality. Based on the fact that many images can be acquired at nearly the same location, this paper proposes a novel learning method for the reduction of Rician noise using nonlinear ridge regression with a training set established

Dinh Hoan Trinh; Marie Luong; Jean-Marie Rocchisani; Canh Duong Pham; Francoise Dibos; Linh-Trung Nguyen

2011-01-01

224

BP network identification technology of infrared polarization based on fuzzy c-means clustering

Infrared detection system is frequently employed on surveillance operations and reconnaissance mission to detect particular targets of interest in both civilian and military communities. By incorporating the polarization of light as supplementary information, the target discrimination performance could be enhanced. So this paper proposed an infrared target identification method which is based on fuzzy theory and neural network with polarization

Haifang Zeng; Guohua Gu; Weiji He; Qian Chen; Wei Yang

2011-01-01

225

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939

Nagwani, Naresh Kumar; Deo, Shirish V.

2014-01-01

226

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939

Nagwani, Naresh Kumar; Deo, Shirish V

2014-01-01

227

We report our application of a recently published simulated annealing algorithm which we call “Boltzmann simplex”-simulated annealing (BSSA) to global optimizations of argon and water clusters. The Lennard–Jones model of argon clusters serves as a challenging benchmark for global optimization methods, and we use it as a test case. We find that the BSSA method is most useful when followed

Francis M. Torres; Eugene Agichtein; Leonid Grinberg; Guowei Yu; Robert Q. Topper

1997-01-01

228

During dynamic susceptibility contrast-magnetic resonance imaging (DSC-MRI), it has been demonstrated that the arterial input function (AIF) can be obtained using fuzzy c-means (FCM) and k-means clustering methods. However, due to the dependence on the initial centers of clusters, both clustering methods have poor reproducibility between the calculation and recalculation steps. To address this problem, the present study developed an alternative clustering technique based on the agglomerative hierarchy (AH) method for AIF determination. The performance of AH method was evaluated using simulated data and clinical data based on comparisons with the two previously demonstrated clustering-based methods in terms of the detection accuracy, calculation reproducibility, and computational complexity. The statistical analysis demonstrated that, at the cost of a significantly longer execution time, AH method obtained AIFs more in line with the expected AIF, and it was perfectly reproducible at different time points. In our opinion, the disadvantage of AH method in terms of the execution time can be alleviated by introducing a professional high-performance workstation. The findings of this study support the feasibility of using AH clustering method for detecting the AIF automatically. PMID:24932638

Yin, Jiandong; Yang, Jiawen; Guo, Qiyong

2014-01-01

229

We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithm (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.

Chen, Wei-Chen [ORNL; Maitra, Ranjan [Iowa State University

2011-01-01

230

Clustering of sensor nodes has been shown to be an effective approach for distributed collaborative information processing in resource constrained wireless sensor networks to keep network traffic local in order to reduce energy dissipation of long-distance transmissions. Defining the range and topology of clusters to reduce energy consumption and retransmissions due to collisions on shared radio channels is an ongoing

Chia-Yen Shih; Stephen F. Jenks

2007-01-01

231

CHEATS: A cluster-head election algorithm for WSN using a Takagi-Sugeno fuzzy system

Energy conservation is very important task for wireless sensor networks (WSN), since external power sources are typically unavailable and the replacement of batteries is clearly impractical for large networks. A classical protocol widely used in WSN, called LEACH (low-energy adaptive clustering hierarchy), is a cluster-based protocol which aims at reducing energy consumption in the network. However, it has some drawbacks.

Adonias Pires; Claudio Silva; Eduardo CerqueiraDionne; Dionne Monteiro; Raimundo Viegas

2011-01-01

232

Evolutionary and Gradient-Based Algorithms for Lennard-Jones Cluster Optimization

optimization strategies as the number of local minima grows exponentially with the number of atoms with the number of atoms N in the cluster [14]. For N = 98, there ex- ist an estimated number of local minima;guration of atomic clusters modeled by the Lennard- Jones potential poses a challenging task to numerical

Schraudolph, Nicol N.

233

A neural network clustering algorithm for the ATLAS silicon pixel detector

A novel technique to identify and split clusters created by multiple charged particles in the ATLAS pixel detector using a set of artificial neural networks is presented. Such merged clusters are a common feature of tracks originating from highly energetic objects, such as jets. Neural networks are trained using Monte Carlo samples produced with a detailed detector simulation. This technique replaces the former clustering approach based on a connected component analysis and charge interpolation. The performance of the neural network splitting technique is quantified using data from proton--proton collisions at the LHC collected by the ATLAS detector in 2011 and from Monte Carlo simulations. This technique reduces the number of clusters shared between tracks in highly energetic jets by up to a factor of three. It also provides more precise position and error estimates of the clusters in both the transverse and longitudinal impact parameter resolution.

ATLAS collaboration

2014-06-30

234

NASA Astrophysics Data System (ADS)

Using a genetic algorithm followed by local optimization with density functional theory, the lowest-energy structures of Ag clusters in a size range of n=3-22 were studied. The Ag ( n=9-16) clusters prefer compact structures of flat shape, while the Ag (n=19,21,22) clusters adopt amorphous packing based on a 13-atom icosahedral core. For Ag, two competitive candidates for the lowest-energy structures, namely a hollow-cage structure and close-packed structures of flat shape, were found. Two competing candidates were found for Ag and Ag: hollow-cage structures versus icosahedron-based compact structures. The lowest-energy structure of Ag is a highly symmetric tetrahedron with Td symmetry. These results are significantly different from those predicted in earlier works using empirical methods. The ionization potentials and electron affinities for the lowest-energy structures of Ag ( n=3-22) clusters were computed and compared with experimental values.

Tian, Dongxu; Zhang, Hualei; Zhao, Jijun

2007-10-01

235

Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220

Tsai, Ming-Hui; Huang, Yueh-Min

2014-01-01

236

of hyperspectral images [2, 4], often used by agencies such as NASA to analyze the surface of the Earth (see Fig. 1-formation to multidimensional images. The proposed technique has been implemented on Thunder- head, a Beowulf cluster at NASA

Plaza, Antonio J.

237

FAST AND EFFICIENT MODEL-BASED CLUSTERING WITH THE ASCENT-EM ALGORITHM

iterations for computational efficiency, the algorithm increase the sample size intelligently to- wards the end of the algorithm to assure maximum accuracy of the results. The intelligent sample size updating the sample size is increased successively. However, de- termining the exact amount by which the sample size

Jank, Wolfgang

238

The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications. PMID:24737307

Yilmaz, Nihat; Inan, Onur; Uzer, Mustafa Serter

2014-05-01

239

NASA Astrophysics Data System (ADS)

This study explores the applicability of data-driven clustering analysis in predicting vegetation distribution over two continents where water is an important controlling factor for vegetation growth, South America and Africa, and compares the ability of clustering analysis with that of a physically based dynamic vegetation model to predict vegetation distribution. A clustering analysis algorithm based on the genetic-algorithm-based K-means is tested, with the number of clusters determined a priori according to the primary plant functional types observed to exist in the study domain. The most important variables upon which the clustering analysis is based include available water, its seasonality, and evaporative demand. The dynamic vegetation model used is the Community Land Model version 3 coupled with a Dynamic Global Vegetation Model (CLM3-DGVM) with modifications targeted to address some known biases of the model. Results from both the clustering analysis and the modified CLM3-DGVM are compared against observations derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). Both methods reasonably reproduced the general pattern of dominant plant functional type distribution. There is no clear winner between the two methods, as the DGVM outperforms the clustering analysis approach in some aspects and is outperformed in others. It is therefore suggested that clustering analysis can be a useful tool in biogeography estimation, although it cannot be used in mechanistic studies as the process-based DGVMs are.

Sun, Xiaoming; Wang, Guiling

2008-09-01

240

The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ. PMID:25591387

Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus

2015-01-14

241

Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System

NASA Astrophysics Data System (ADS)

Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.

Akhavan, P.; Karimi, M.; Pahlavani, P.

2014-10-01

242

An Event-Driven Algorithm for Fractal Cluster S. Gonzalez, A. R. Thornton, S. Luding

. In contrast to the case of diffusion-limited aggregation (DLA), where df = 1.71 is found [7], we keep track dimensional gas: particles move freely until they collide and "stick" together irreversibly. These clusters aggregate into bigger structures in an isotropic way, forming fractal structures whose fractal dimension

Luding, Stefan

243

Information Searching and Exploring on WWW: Applying Clustering and Genetic Algorithm

features like Boolean controls or structure queries (which are advanced searching tips or options wordÂbased search engines are processed using clustering techniques and are summarized to extract information that has not yet been discovered. References [1] Chang, C.H. and Hsu, C.C. ``MultiÂEngine Search

Chang, Chia-Hui

244

Sensitivity evaluation of dynamic speckle activity measurements using clustering methods.

We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data. PMID:20648142

Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H

2010-07-01

245

The Implementation of Regional Atmospheric Model Numerical Algorithms for CBEA-Based Clusters

\\u000a Regional atmospheric models are important tools for short-range weather predictions and future climate change assessment.\\u000a The further enhancement of spatial resolution and development of physical parameterizations in these models need the effective\\u000a implementation of the program code on multiprocessor systems. However, nowadays typical cluster systems tend to grow into\\u000a very huge machines with over petaflop performance, while individual computing node

Dmitry Mikushin; Victor Stepanenko

2009-01-01

246

DWT-CEM: an algorithm for scale-temporal clustering in fMRI

The number of studies using functional magnetic resonance imaging (fMRI) has grown very rapidly since the first description\\u000a of the technique in the early 1990s. Most published studies have utilized data analysis methods based on voxel-wise application\\u000a of general linear models (GLM). On the other hand, temporal clustering analysis (TCA) focuses on the identification of relationships\\u000a between cortical areas by

João Ricardo Sato; André Fujita; Edson Amaro Jr.; Janaina Mourão Miranda; Pedro Alberto Morettin; Michal John Brammer

2007-01-01

247

Clustering algorithm using space filling curves for the classification of high energy physics data

According to a space filling curve, distances between points in a multidimensional space are replaced by distances along aLebesgue measure-preserving curve. By using a neighbouring approach on the space filling curve, several clusters may emerge from data and configurations may be associated to the consideredk-jets events classes and corresponding to the processese+ e? ? Z\\/?, W+W?, ZH, ZZ ? k

Mostafa Mjahed

2003-01-01

248

An algorithm for identifying clusters of functionally related genes in genomes

to occur in Caenorhabditis elegans and share many similarities with their prokaryotic counterparts. Fungi also contain metabolic pathwayclustersthoughtheirstructuredifiersconsiderablyfromoperonsin C. elegans (Blumenthal [3], Zorio [4] and Spieth [5]). Some.... elegans, Thomas [16] showed that clusters of homologous genes tend to be formed of species speciflc gene families that play roles in detoxiflcation and immunity, and are found in chromosomal regions that undergo rapid evolution and reorganization. Further...

Yi, Gang Man

2009-05-15

249

Analyzed the prescriptions for phlegm retention syndrome that built by Ma Peizhi by the association rules and clustering algorithm, the frequency of drug usage and the relationship between drugs could be get. And from that we could conclude the experiences for phlegm retention syndrome of Ma Peizhi of menghe medical genre. The results of the analysis were that 18 core combinations were dig out, such as Citri Exocarpium Rubrum-Eriobotryae Folium-Citri Reticulatae Pericarpium. And there were 9 new prescriptions were found out such as Aurantii Fructus-Citri Exocarpium Rubium-Eriobotryae Folium-Citri Reticulatae Pericarpium. The results of the analysis were proved that Ma Peizhi of Menghe Medical Genre was good at curing phlegm retention syndrome by using the traditional Chinese medicine of mild and light, such as the medicines of mild tonification, and clearing damp and promoting diuresis. PMID:25204136

Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Huang, Xiu-Qin; Yang, Bing

2014-02-01

250

Purpose To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. Materials and Methods We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. Results The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Conclusion Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis. PMID:24083208

Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira

2013-01-01

251

Defining genetic networks underlying animal behavior in a high throughput manner is an important but challenging task that has not yet been achieved for any organism. Using Caenorhabditis elegans, we collected quantitative parametric data related to various aspects of locomotion from wild type and thirty-one mutant worm strains with single mutations in genes functioning in sensory reception, neurotransmission, G-protein signaling, neuromuscular control or other facets of motor regulation. We applied unsupervised and constrained K-means clustering algorithms to the data and found that the genes that clustered together due to the behavioral similarity of their mutants encoded proteins in the same signaling networks. This approach provides a framework to identify genes and genetic networks underlying worm neuromotor function in a high-throughput manner. A publicly accessible database harboring the visual and quantitative behavioral data collected in this study adds valuable information to the rapidly growing C. elegans databanks that can be employed in a similar context. PMID:21376755

Zhang, Shijie; Jin, Wei; Huang, Ying; Su, Wei; Yang, Jiong; Feng, Zhaoyang

2011-01-01

252

NASA Astrophysics Data System (ADS)

A set of analytical and computational tools based on transition path theory (TPT) is proposed to analyze flows in complex networks. Specifically, TPT is used to study the statistical properties of the reactive trajectories by which transitions occur between specific groups of nodes on the network. Sampling tools are built upon the outputs of TPT that allow to generate these reactive trajectories directly, or even transition paths that travel from one group of nodes to the other without making any detour and carry the same probability current as the reactive trajectories. These objects permit to characterize the mechanism of the transitions, for example by quantifying the width of the tubes by which these transitions occur, the location and distribution of their dynamical bottlenecks, etc. These tools are applied to a network modeling the dynamics of the Lennard-Jones cluster with 38 atoms () and used to understand the mechanism by which this cluster rearranges itself between its two most likely states at various temperatures.

Cameron, Maria; Vanden-Eijnden, Eric

2014-08-01

253

A modified fuzzy C-means classification method using a multiscale diffusion filtering scheme.

A fully automatic, multiscale fuzzy C-means (MsFCM) classification method for MR images is presented in this paper. We use a diffusion filter to process MR images and to construct a multiscale image series. A multiscale fuzzy C-means classification method is applied along the scales from the coarse to fine levels. The objective function of the conventional fuzzy C-means (FCM) method is modified to allow multiscale classification processing where the result from a coarse scale supervises the classification in the next fine scale. The method is robust for noise and low-contrast MR images because of its multiscale diffusion filtering scheme. The new method was compared with the conventional FCM method and a modified FCM (MFCM) method. Validation studies were performed on synthesized images with various contrasts and on the McGill brain MR image database. Our MsFCM method consistently performed better than the conventional FCM and MFCM methods. The MsFCM method achieved an overlap ratio of greater than 90% as validated by the ground truth. Experiments results on real MR images were given to demonstrate the effectiveness of the proposed method. Our multiscale fuzzy C-means classification method is accurate and robust for various MR images. It can provide a quantitative tool for neuroimaging and other applications. PMID:18684658

Wang, Hesheng; Fei, Baowei

2009-04-01

254

A binned clustering algorithm to detect high-Z material using cosmic muons

NASA Astrophysics Data System (ADS)

We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE

Thomay, C.; Velthuis, J. J.; Baesso, P.; Cussans, D.; Morris, P. A. W.; Steer, C.; Burns, J.; Quillin, S.; Stapleton, M.

2013-10-01

255

A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

NASA Astrophysics Data System (ADS)

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ?-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.

Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

256

With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts. PMID:25487092

Thimmaiah, Tim; Voje, William E; Carothers, James M

2015-01-01

257

Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.

Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina, E-mail: despina.kontos@uphs.upenn.edu [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)] [Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (United States)

2013-12-15

258

Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's paired t-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation of r = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment. PMID:24320533

Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina

2013-01-01

259

A continuous time cluster algorithm for two-level systems coupled to a dissipative bosonic bath is presented and applied to the sub-Ohmic spin-boson model. When the power s of the spectral function Jomega proportional, variant omegas is smaller than 1/2, the critical exponents are found to be classical, mean-field like. Potential sources for the discrepancy with recent renormalization group predictions are traced back to the effect of a dangerously irrelevant variable. PMID:19257337

Winter, André; Rieger, Heiko; Vojta, Matthias; Bulla, Ralf

2009-01-23

260

Use of solvent-mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein “hot spots”. Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages. PMID:18679808

Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.

2010-01-01

261

NASA Technical Reports Server (NTRS)

Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.

Werth, L. F. (principal investigator)

1981-01-01

262

NASA Astrophysics Data System (ADS)

In the 2D non-contacted body measurement, the transform model which converts the human body 2D girth data to the 3D girth data is required. However, the integrate model is hardly to be obtained for the different human body type categories determine the different model parameter. So, the work of human body type accuracy classification based on the measure data is very important. The canonical transformation method is used to strengthen the similar of data features of the same type and broaden the diversity of the data features of the different type. The "accumulating dead bodies" ant colony algorithm is improved in the paper in the way of employing the road information densities to help the ant to select the probable path lead to site of the accumulating dead bodies when it moves the data. By the way, the randomness and blindness of the ants' walking are eliminated, and the speed of the algorithm convergence is improved. For avoiding the unevenness of the data unit visited times in the algorithm, the access mechanism of the union data is employed, which avoid the algorithm to get into the local foul trap. The clustering validity function is selected to verify the clustering result of the human measure data. The experiment results indicate the affectivity and efficiency of the human body clustering work based on the improved ant colony algorithm. Basing the sorting result, the accuracy 3D body data transforming model can be founded, which should improve the accuracy of the non-contacted body measurement.

Zhan, Qun; Zhao, Nanxiang

2011-08-01

263

An improved FCM medical image segmentation algorithm based on MMTD.

Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) is one of the popular clustering algorithms for medical image segmentation. But FCM is highly vulnerable to noise due to not considering the spatial information in image segmentation. This paper introduces medium mathematics system which is employed to process fuzzy information for image segmentation. It establishes the medium similarity measure based on the measure of medium truth degree (MMTD) and uses the correlation of the pixel and its neighbors to define the medium membership function. An improved FCM medical image segmentation algorithm based on MMTD which takes some spatial features into account is proposed in this paper. The experimental results show that the proposed algorithm is more antinoise than the standard FCM, with more certainty and less fuzziness. This will lead to its practicable and effective applications in medical image segmentation. PMID:24648852

Zhou, Ningning; Yang, Tingting; Zhang, Shaobai

2014-01-01

264

Soft Clustering Criterion Functions for Partitional Document Clustering

their soft-clustering extensions, present a comprehensive experimental evaluation involving twelve different], or on the nature of the membership function, leading to hard (crisp) or soft (fuzzy) solu- tions. In recent years" the results obtained by fuzzy C-means pro- duces better hard clustering solutions than direct K-means [17

Karypis, George

265

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared. PMID:18276467

Hall, L O; Bensaid, A M; Clarke, L P; Velthuizen, R P; Silbiger, M S; Bezdek, J C

1992-01-01

266

NASA Astrophysics Data System (ADS)

We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].

Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.

2010-06-01

267

Clustering with Bregman Divergences

A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we pro- pose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based paramet- ric clustering approaches, such as

Arindam Banerjee; Srujana Merugu; Inderjit S. Dhillon; Joydeep Ghosh

2004-01-01

268

Clustering method for estimating principal diffusion directions

Diffusion tensor magnetic resonance imaging (DTMRI) is a non-invasive tool for the investigation of white matter structure within the brain. However, the traditional tensor model is unable to characterize anisotropies of orders higher than two in heterogeneous areas containing more than one fiber population. To resolve this issue, high angular resolution diffusion imaging (HARDI) with a large number of diffusion encoding gradients is used along with reconstruction methods such as Q-ball. Using HARDI data, the fiber orientation distribution function (ODF) on the unit sphere is calculated and used to extract the principal diffusion directions (PDDs). Fast and accurate estimation of PDDs is a prerequisite for tracking algorithms that deal with fiber crossings. In this paper, the PDDs are defined as the directions around which the ODF data is concentrated. Estimates of the PDDs based on this definition are less sensitive to noise in comparison with the previous approaches. A clustering approach to estimate the PDDs is proposed which is an extension of fuzzy c-means clustering developed for orientation of points on a sphere. MDL (Minimum description length) principle is proposed to estimate the number of PDDs. Using both simulated and real diffusion data, the proposed method has been evaluated and compared with some previous protocols. Experimental results show that the proposed clustering algorithm is more accurate, more resistant to noise, and faster than some of techniques currently being utilized. PMID:21642005

Nazem-Zadeh, Mohammad-Reza; Jafari-Khouzani, Kourosh; Davoodi-Bojd, Esmaeil; Jiang, Quan; Soltanian-Zadeh, Hamid

2012-01-01

269

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

2015-01-01

270

Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c sophisticated scheme. That is, different compression ratios are applied to the wavelet coefficients belonging. The goal of such a compression methodology that aims at maximization of the compression ratio should

Athens, University of

271

BACKGROUND: Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer variations in human interpretation. In this study, we compare two approaches for automatic border detection in dermoscopy images:

Sinan Kockara; Mutlu Mete; Bernard Chen; Kemal Aydin

2010-01-01

272

New Cluster Validity Index with Fuzzy Functions

A new cluster validity index is introduced to validate the results obtained by the recent Improved Fuzzy Clustering (IFC),\\u000a which combines two different methods, i.e., fuzzy c-means clustering and fuzzy c-regression, in a novel way. Proposed validity\\u000a measure determines the optimum number of clusters of the IFC based on a ratio of the compactness to separability of the clusters.\\u000a The

Asli Çelikyilmaz; I. Burhan Türksen

2007-01-01

273

In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996

Srivastava, Subodh; Sharma, Neeraj; Singh, S. K.; Srivastava, R.

2014-01-01

274

In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996

Srivastava, Subodh; Sharma, Neeraj; Singh, S K; Srivastava, R

2014-07-01

275

, the setting of its parameters and the image resolution, the numbers of streamlines obtained (called fibers tractography limitations, such as low resolution and crossing fibers issues, white matter analysis, and segmentation mistakes are carried on to the clustering. Unsupervised fiber clustering suffers from high computa

Paris-Sud XI, UniversitÃ© de

276

This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of this cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.

Not Available

1994-02-02

277

We consider the problem of clustering two-dimensional as- sociation rules in large databases. We present a geometric- based algorithm, BitOp, for performing the clustering, em- bedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segment- ation generated by ARCS using the Minimum

Brian Lentt; Arun N. Swami; Jennifer Widom

1997-01-01

278

Clustering with Bregman Divergences

Abstract A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering,

Arindam Banerjee; Srujana Merugu; Inderjit S. Dhillon; Joydeep Ghosh

2005-01-01

279

NASA Astrophysics Data System (ADS)

A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.

Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

2014-04-01

280

Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral (ERI) tensor and the two-particle excitation amplitudes used in the parametric reduced density matrix (pRDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r4), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the traditional pRDM algorithm, somewhere between that of CCSD and CCSD(T).

Shenvi, Neil; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David

2013-01-01

281

Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246

Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David

2013-08-01

282

We have developed a new method (Independent Cluster Decomposition Algorithm, ICDA) for creating all-atom models of proteins given the heavy-atom coordinates, provided by X-ray crystallography, and the pH. In our method the ionization states of titratable residues, the crystallographic mis-assignment of amide orientations in Asn/Gln, and the orientations of OH/SH groups are addressed under the unified framework of polar states assignment. To address the large number of combinatorial possibilities for the polar hydrogen states of the protein, we have devised a novel algorithm to decompose the system into independent interacting clusters, based on the observation of the crucial interdependence between the short range hydrogen bonding network and polar residue states, thus significantly reducing the computational complexity of the problem and making our algorithm tractable using relatively modest computational resources. We utilize an all atom protein force field (OPLS) and a Generalized Born continuum solvation model, in contrast to the various empirical force fields adopted in most previous studies. We have compared our prediction results with a few well-documented methods in the literature (WHATIF, REDUCE). In addition, as a preliminary attempt to couple our polar state assignment method with real structure predictions, we further validate our method using single side chain prediction, which has been demonstrated to be an effective way of validating structure prediction methods without incurring sampling problems. Comparisons of single side chain prediction results after the application of our polar state prediction method with previous results with default polar state assignments indicate a significant improvement in the single side chain predictions for polar residues. PMID:17154422

Li, Xin; Jacobson, Matthew P; Zhu, Kai; Zhao, Suwen; Friesner, Richard A

2007-03-01

283

Fuzzy technique for microcalcifications clustering in digital mammograms

Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from target dimensions and to optimize the recognition efficiency. A clustering method, based on a Fuzzy C-means (FCM), has been developed. The described method, Fuzzy C-means with Features (FCM-WF), was tested on simulated clusters of microcalcifications, implying that the location of the cluster within the breast and the exact number of microcalcifications are known. The proposed method has been also tested on a set of images from the mini-Mammographic database provided by Mammographic Image Analysis Society (MIAS) publicly available. Results The comparison between FCM-WF and standard FCM algorithms, applied on both databases, shows that the former produces better microcalcifications associations for clustering than the latter: with respect to the private and the public database we had a performance improvement of 10% and 5% with regard to the Merit Figure and a 22% and a 10% of reduction of false positives potentially identified in the images, both to the benefit of the FCM-WF. The method was also evaluated in terms of Sensitivity (93% and 82%), Accuracy (95% and 94%), FP/image (4% for both database) and Precision (62% and 65%). Conclusions Thanks to the private database and to the informations contained in it regarding every single microcalcification, we tested the developed clustering method with great accuracy. In particular we verified that 70% of the injected clusters of the private database remained unaffected if the reconstruction is performed with the FCM-WF. Testing the method on the MIAS databases allowed also to verify the segmentation properties of the algorithm, showing that 80% of pathological clusters remained unaffected. PMID:24961885

2014-01-01

284

The first application of the ab initio genetic algorithm with an embedded gradient has been carried out for the elucidation of global minimum structures of a series of anionic sodium chloride clusters, NaxClx+1- (x = 1-4), produced in the gas-phase using electrospray ionization and studied by photoelectron spectroscopy. These are all superhalogen species with extremely high electron binding energies. The adiabatic electron binding energies for NaxClx+1- were measured to be 5.64, 6.22, 6.3, and 6.9 eV, for x = 1-4, respectively. Our genetic algorithm program detected the linear global minima of NaCl2- and Na2Cl3- and tree-dimensional structures for the larger species. Na3Cl4- was found to have C3v symmetry, which can be viewed as a Na4Cl4 cube missing a corner Na+ cation, whereas Na4Cl5- was found to have C4v symmetry, close to a 3x3 planar structure. Further accurate ab initio calculations were carried out for the elucidated global minimum structures. Excellent agreement between the theoretically calculated and the experimental spectra was observed, confirming the obtained structures and demonstrating the power of the developed genetic algorithm technique.

Alexandrova, Anastassia N.; Boldyrev, Alexander I.; Fu, Youjun; Yang, Xin; Wang, Xue B.; Wang, Lai S.

2004-09-22

285

NASA Astrophysics Data System (ADS)

The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h-1Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k˜1hMpc-1. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak lensing observations. We apply the method to a range of cosmological models and explicitly show the viability of our new statistic to distinguish between cosmological models.

Baldauf, Tobias; Smith, Robert E.; Seljak, Uroš; Mandelbaum, Rachel

2010-03-01

286

Evolutionary multi-objective clustering for overlapping clusters detection

Evolutionary algorithms have a history of being applied into clustering analysis. However, most of the existing evolutionary clustering techniques fail to detect complex\\/spiral-shaped clusters. In our previous works, we proposed several evolutionary multi-objective clustering algorithms and achieved promising results. Still, they suffer from this usual problem exhibited by evolutionary and unsupervised clustering approaches. In this paper, we proposed an improved

Kazi Shah Nawaz Ripon; Mia Nazmul Haque Siddique

2009-01-01

287

NASA Astrophysics Data System (ADS)

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ? RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind

Wagstaff, Kiri L.

2012-03-01

288

Adaptive Clustering of Hypermedia Documents.

ERIC Educational Resources Information Center

Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…

Johnson, Andrew; Fotouhi, Farshad

1996-01-01

289

Adaptive Fuzzy Clustering Nicolas Cebron and Michael R. Berthold

propose a new adaptive active clustering scheme, based on an initial Fuzzy c-means clustering and LearningAdaptive Fuzzy Clustering Nicolas Cebron and Michael R. Berthold Department of Computer for adjustment of the classification by the user. Motivated by the concept of active learning, the learner tries

Berthold, Michael R.

290

Regularized color clustering for medical image database

A regularized color clustering algorithm is proposed to solve the color clustering problem in medical image database. By incorporating both measures of cluster separability and cluster compactness, regularized color clustering allows the automatic extraction of significant color groups with varying populations. Experimental results in different color spaces show that the regularized color clustering gives superior results in extracting significant distinct\\/abnormal

Chun Hung Li; Pong Chi Yuen

2000-01-01

291

Definition of Management Zones of Soil Nutrientsbased on Fcm Algorithm in Oasis Field

NASA Astrophysics Data System (ADS)

The objective of this research was to define management zones of oasis cotton field. The variables of organic matter, available N, available P and available K data determined in 193 topsoil (0-30cm) samples were selected as data sources. Fuzzy c-means clustering algorithm was used to delineate management zones. In order to determine the optimum fuzzy control parameters, the fuzziness performance index (FPI), c-? combinations and the multiple regression based on external variable were used in this study. Meanwhile, the cotton yield was chosen as the external variable. The whole field was divided in four management zones. And fuzziness exponent was 1.6. The zoning statistics showed that variation coefficient of soil nutrients decreased, while the means of the soil nutrients differed sharply between management zones. The average confusion index was 0.19 in all management zones. The overlapping of fuzzy classes at points was low and the spatial distribution of membership grades was unambiguous. The results indicated that fuzzy c-means clustering algorithm could be used to delineate management zones by selecting the appropriate external variables. The defined management zones can be used for fertilizer recommendation to manage soil nutrient more efficiently.

Lu, Xin; Chen, Yan

292

Watershed-Based Unsupervised Clustering Manuele Bicego, Marco Cristani, Andrea Fusiello purpose clustering algorithm is presented, based on the watershed algorithm. The proposed approach defines. The clustering is then performed using the well-known watershed algorithm, paying particular attention

Bicego, Manuele

293

Watershed-Based Unsupervised Clustering

In this paper, a novel general purpose clustering algorithm is presented, based on the watershed algorithm. The proposed approach defines a density function on a suitable lattice, whose cell dimension is carefully estimated from the data. The clustering is then performed using the well-known watershed algorithm, paying particular attention to the boundary situations. The main characteristic of this method is

Manuele Bicego; Marco Cristani; Andrea Fusiello; Vittorio Murino

2003-01-01

294

NASA Technical Reports Server (NTRS)

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.

Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.

1992-01-01

295

NASA Astrophysics Data System (ADS)

The genetic algorithms in the variant developed earlier and in that improved by the mechanism of niches have been exploited in the analysis of global and local minima of the potential energy depending either on the inverse distance between particles (as in Coulomb interactions) or on the logarithm of this distance. The number N of point-charge particles is finite and they are confined by the parabolic potential. Solutions of the optimization problems yield for 9?N?30 the ground-state configurations and a number of the metastable configurations as well as some saddle points. For the model with the logarithmic interactions, the new ground-state configuration is found for N=20, whereas for the Coulomb model, some new configurations, observed earlier in the molecular dynamics simulations, are determined.

Sobczak, Pawel; Kucharski, Lukasz; Kamieniarz, Grzegorz

2011-09-01

296

NASA Astrophysics Data System (ADS)

Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (?-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several slices yields average accuracy, sensitivity and specificity higher than 90%. Results indicate that FCM is an appropriate technique for tissue classification in the sinusoidal area where there is air-bone interface. Furthermore, using Dixon method, fat and brain tissues were successfully separated.

Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza

2014-01-01

297

NASA Astrophysics Data System (ADS)

Camouflaged targets detection in complex background is a challenging problem. Spectral-polarimetric imaging can offers spectral information and polarization information from the objects in the scene. Fusion of the spectral and polarization information in the images will result in better camouflaged target identification and recognition. In this paper a novel spectral-polarimetric image fusion algorithm based on Shearlet transform is proposed. Firstly, every polarimetric image in each wave band is decomposed into images of low frequency components and high frequency components by Shearlet transform. Then, the fused low frequency approximate coefficients are obtained with weighted average method, and the fused high frequency coefficients are obtained with area-based feature selection method, so features and details from different spectral-polarimetric images are fused successfully. After that, the kernel fuzzy c-means clustering algorithm is used for camouflaged object separation from its background. Experimental results have shown that better identification performance was achieved.

Zhou, Pu-cheng; Liu, Cun-chao

2013-08-01

298

NASA Astrophysics Data System (ADS)

We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620

Komura, Yukihiro; Okabe, Yutaka

2014-03-01

299

Molecular Image Segmentation Based on Improved Fuzzy Clustering

Segmentation of molecular images is a difficult task due to the low signal-to-noise ratio of images. A novel two-dimensional fuzzy C-means (2DFCM) algorithm is proposed for the molecular image segmentation. The 2DFCM algorithm is composed of three stages. The first stage is the noise suppression by utilizing a method combining a Gaussian noise filter and anisotropic diffusion techniques. The second stage is the texture energy characterization using a Gabor wavelet method. The third stage is introducing spatial constraints provided by the denoising data and the textural information into the two-dimensional fuzzy clustering. The incorporation of intensity and textural information allows the 2DFCM algorithm to produce satisfactory segmentation results for images corrupted by noise (outliers) and intensity variations. The 2DFCM can achieve 0.96 ± 0.03 segmentation accuracy for synthetic images under different imaging conditions. Experimental results on a real molecular image also show the effectiveness of the proposed algorithm. PMID:18368139

Yu, Jinhua; Wang, Yuanyuan

2007-01-01

300

NASA Astrophysics Data System (ADS)

Terrestrial Laser Scanners (TLS) are used frequently in three dimensional documentation studies and present an alternative method for three dimensional modeling without any deformation of scale. In this study, point cloud data segmentation is used for photogrammetrical image data production from laser scanner data. The segmentation studies suggest several methods for automation of curve surface determination for digital terrain modeling. In this study, fuzzy logic approach has been used for the automatic segmentation of the regular curve surfaces which differ in their depths to the instrument. This type of shapes has been usually observed in the dome surfaces for close range architectural documentation. The model of C-means integrated fuzzy logic approach has been developed with MatLAB 7.0 software. Gauss2mf membership functions algorithm has been tested with original data set. These results were used in photogrammetric 3D modeling process. As the result of the study, testing the results of point cloud data set has been discussed and interpreted with all of its advantages and disadvantages in Section 5.

Ergun, Bahadir; Sahin, Cumhur; Ustuntas, Taner

2014-01-01

301

Haplotyping Problem, A Clustering Approach

Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.

Eslahchi, Changiz [Faculty of Mathematics, Shahid-Beheshti University, Tehran (Iran, Islamic Republic of); Bioinformatics Group, School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran (Iran, Islamic Republic of); Sadeghi, Mehdi [National Institute for Genetic Engineering and Biotechnology, Tehran-Karaj Highway, Tehran (Iran, Islamic Republic of); Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran (Iran, Islamic Republic of); Pezeshk, Hamid [Center of Excellence in Biomathematics, School of Mathematics, Statistics and Computer Science, University College of Science, University of Tehran, Tehran (Iran, Islamic Republic of); Kargar, Mehdi [Department of Computer Engineering, Sharif University of Technology, Tehran (Iran, Islamic Republic of); Poormohammadi, Hadi [Faculty of Mathematics, Shahid-Beheshti University, Tehran (Iran, Islamic Republic of)

2007-09-06

302

A diabetic retinopathy detection method using an improved pillar K-means algorithm.

The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage. PMID:24516323

Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa

2014-01-01

303

A diabetic retinopathy detection method using an improved pillar K-means algorithm

The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage. PMID:24516323

Gogula, Susmitha valli; Divakar, CH; Satyanarayana, CH; Rao, Allam Appa

2014-01-01

304

PGK algorithm in hyperspectral image compression

this paper presents compression algorithm of multispectral image. At first multispectral image is converted to N-dimensional space vector, and then using PGK clustering algorithm to cluster compression, Experiments show that this algorithm with the detection vegetation can achieve the purpose of high compression ratio, low algorithm complexity, restored

Xu Su; Tan Xue; Chen Shanxue

2011-01-01

305

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K -means clustering or fuzzy C -means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018

Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

2014-12-01

306

CLEAN: CLustering Enrichment ANalysis

Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co-expressed genes over using the traditional cluster-wide scores. Using gene-specific coherence scores also simplifies the comparisons of clusterings produced by different clustering algorithms and provides a simple tool for selecting genes with a "functionally coherent" expression profile. PMID:19640299

Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen; Medvedovic, Mario

2009-01-01

307

Relation chain based clustering analysis

NASA Astrophysics Data System (ADS)

Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In this paper, a new method called relation chain based clustering is presented. The given method demonstrates that arbitrary distribution shape and density are not the essential factors for clustering research, in another words, clusters described by some particular expressions should be considered as a uniform mathematical description which is called "relation chain" emphasized in this paper. The relation chain indicates the relation between each pair of the spatial points and gives the evaluation of the connection between the pair-wise points. This relation chain based clustering algorithm initially assigns the neighborhood evaluation radius of the points, then assesses the clustering result based on inner-cluster variance of each cluster while increasing the radius, adjusting the radius properly and finally gives the clustering result. Some experiments are conducted using the proposed method and the hidden data structure is well explored.

Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

2011-08-01

308

Clustering is a technique adopted in several application fields as for example artificial neural networks, data compression,\\u000a pattern recognition, etc. This paper presents the Enhanced LBG (ELBG) a new clustering algorithm deriving directly from the\\u000a well-known classical LBG algorithm. It belongs to the hard and K-means vector quantization groups. We started from the definition of a new mathematical concept we

Marco Russo; Giuseppe Patanè

1999-01-01

309

Metamodel-based global optimization using fuzzy clustering for design space reduction

NASA Astrophysics Data System (ADS)

High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness and capability of global convergence and gives a new optimization strategy for engineering design optimization problems involving expensive black box models.

Li, Yulin; Liu, Li; Long, Teng; Dong, Weili

2013-09-01

310

This paper presents the design and implementation of an embedded soft sensor, i.e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called "Limited Rules", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. PMID:17981281

Garcia, Claudio; de Carvalho Berni, Cássio; de Oliveira, Carlos Eduardo Neri

2008-04-01

311

Clustering applications cover severalelds such as audio and video data compression,pattern recognition, computer vision, medical image recognition, etc. In thispaper we present a new clustering algorithm called Enhanced LBG (ELBG). Itbelongs to the hard and K-means vector quantization groups and derives directlyfrom the simpler LBG. The basic idea we have developed is the concept of utilityof a codeword, a powerful

Giuseppe Patanè; Marco Russo

2001-01-01

312

Robust Clustering with Applications in Computer Vision

A clustering algorithm based on the minimum volume ellipsoid (MVE) robust estimator is proposed. The MVE estimator identifies the least volume region containing h percent of the data points. The clustering algorithm iteratively partitions the space into clusters without prior information about their number. At each iteration, the MVE estimator is applied several times with values of h decreasing from

Jean-michel Jolion; Peter Meer; Samira Bataouche

1991-01-01

313

Evolutionary multi-objective clustering with adaptive local search

In many real-world applications, the accurate number of clusters in the data set may be unknown in advance. In addition, clustering criteria are usually high dimensional, nonlinear and multi-model functions and most existing clustering algorithms are only able to achieve a clustering solution that locally optimizes them. Therefore, a single clustering criterion sometimes fails to identify all clusters in a

Kazi Shah Nawaz Ripon; Kyrre Glette; Mats Hovin; Jim Torresen

2010-01-01

314

NASA Astrophysics Data System (ADS)

Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7]4-. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.

Weigend, Florian

2014-10-01

315

NASA Astrophysics Data System (ADS)

Digital image segmentation is the process of assigning distinct labels to different objects in a digital image, and clustering techniques can be used to achieve such segmentations. However, many traditional segmentation algorithm fail to segment objects that are characterized by textures whose patterns cannot be successfully described by simple statistics computed over a very restricted area. In this paper we present a fuzzy clustering algorithm that achieves the segmentation of images with color textures by employing a distance function based on the Skew Divergence, that is based on the well-known Kullback-Leibler Divergence. In order for such a distance to produce good results when applied to color images, we reduced the dimensionality of the image's histogram, thus eliminating the sparsity of the color histogram and speeding up the execution of the algorithm. We performed experiments on thin rock sections and compared our results to the segmentations obtained by the Fuzzy C-Means and by another fuzzy segmentation technique, showing the superiority of our approach.

Siebra, Hélio; Carvalho, Bruno M.; Garduño, Edgar

2015-01-01

316

Brightest Cluster Galaxy Identification

NASA Astrophysics Data System (ADS)

Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.

Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team

2011-01-01

317

City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs.

The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate mRNA expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. The members of a cluster that are in close proximity on chromosomes are highly likely to be processed as co-transcribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents a clustering algorithm in order to extract meaningful information from miRNA expression data. It judiciously integrates the merits of rough sets, fuzzy sets, the c-means algorithm, and the normalized range-normalized city block distance to discover co-expressed miRNA clusters. While the membership functions of fuzzy sets enable efficient handling of overlapping partitions in a noisy environment, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition. The city block distance is used to compute the membership functions of fuzzy sets and to find initial partition of a data set, and therefore helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated for several miRNA expression data sets using different cluster validity indices. Moreover, the gene ontology is used to analyze the functional consistency and biological significance of generated miRNA clusters. PMID:24682049

Paul, Sushmita; Maji, Pradipta

2014-06-01

318

CACTUS—clustering categorical data using summaries

Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received some attention. However, previous algorithms do not give a formal description of the clusters they discover

Venkatesh Ganti; Johannes Gehrket; Raghu Ramakrishnant

1999-01-01

319

Texture image segmentation on improved watershed and multiway spectral clustering

Spectral clustering is a new graph and similarity based clustering algorithm. When the image is too big, it will take a long time to compute affinity matrix and its eigenvalues and eigenvectors. In order to improve the convergent speed of spectral clustering, a two-stage texture segmentation algorithm is proposed in this paper. First, an improved watershed algorithm is used to

Xiuli Ma; Wanggen Wan; Jincao Yao

2008-01-01

320

Static and Dynamic Information Organization with Star Clusters

Static and Dynamic Information Organization with Star Clusters Javed Aslam Katya Pelekhov Daniela on TREC data. We introduce the o#Âline and onÂline star clustering algorithms for information or and average link clustering algorithms. Since the star algorithm is also highly e#cient and simple

Aslam, Javed

321

Text document clustering based on frequent word meaning sequences

Most of existing text clustering algorithms use the vector space model, which treats documents as bags of words. Thus, word sequences in the documents are ignored, while the meaning of natural languages strongly depends on them. In this paper, we propose two new text clustering algorithms, named Clustering based on Frequent Word Sequences (CFWS) and Clustering based on Frequent Word

Yanjun Li; Soon M. Chung; John D. Holt

2008-01-01

322

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721

Slonim, Noam; Atwal, Gurinder Singh; Tka?ik, Gašper; Bialek, William

2005-01-01

323

Time series clustering analysis of health-promoting behavior

NASA Astrophysics Data System (ADS)

Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

2013-10-01

324

, Department of Electrical and Computer Engineering, Room 415, University of Louisville, Louisville, KY 40292 USA (e-mail: farag@cairo.spd.louisville.edu). T. Moriarty is with the Department of Neurological Surgery, University of Louisville, Louisville, KY 40292 USA. Publisher Item Identifier S 0278

Louisville, University of

325

Color segmentation using MDL clustering

NASA Astrophysics Data System (ADS)

This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.

Wallace, Richard S.; Suenaga, Yasuhito

1991-02-01

326

Static and dynamic information organization with star clusters

In this paper we present a system for static and dy- namic information organization and show our evaluations of this system on TREC data. We introduce the off-line and on-line star clustering algorithms for information or- ganization. Our evaluation experiments show that the off- line star algorithm outperforms the single link and average link clustering algorithms. Since the star algorithm

Javed A. Aslam; Katya Pelekhov; Daniela Rus

1998-01-01

327

Model-based clustering and data transformations for gene expression data

Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal

Ka Yee Yeung; Chris Fraley; A. Murua; Adrian E. Raftery; Walter L. Ruzzo

2001-01-01

328

Detecting Galaxy Clusters in the DLS and CARS: a Bayesian Cluster Finder

The detection of galaxy clusters in present and future surveys enables measuring mass-to-light ratios, clustering properties or galaxy cluster abundances and therefore, constraining cosmological parameters. We present a new technique for detecting galaxy clusters, which is based on the Matched Filter Algorithm from a Bayesian point of view. The method is able to determine the position, redshift and richness of

Begoña Ascaso; David Wittman; Narciso Benítez

2010-01-01

329

Swarm Intelligence in Text Document Clustering

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

2008-01-01

330

Segmentation of MR brain images using FCM improved by artificial bee colony (ABC) algorithm

Segmentation of medical images, particularly magnetic resonance images of brain is complex and it is considered as a huge challenge in image processing. Among the numerous algorithms presented in this context, the fuzzy C-mean (FCM) algorithm is widely used in MR images segmentation. Recently, researchers have introduced two new parameters in order to improve the performance of FCM algorithm, which

M. Taherdangkoo; M. Yazdi; M. H. Rezvani

2010-01-01

331

IGroup: a web image search engine with semantic clustering of search results

In this demo, we present IGroup, a Web image search engine that organizes the search results into semantic clusters. Different from all existing Web image search results clustering algorithms that only cluster the top few images using visual or textual features, IGroup first identifies several query-related semantic clusters based on a key phrases extraction algorithm originally proposed for clustering general

Feng Jing; Changhu Wang; Yuhuan Yao; Kefeng Deng; Lei Zhang; Wei-ying Ma

2006-01-01

332

An Efficient Approach to Clustering in Large Multimedia Databases with Noise

Abstract Several clustering algorithms can be applied to clustering in large multimedia databases. The e ectiveness and e ciency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires cluster-ing high-dimensional feature vectors and since multimedia databases often contain large amounts of noise. In this pa-per, we therefore introduce a new algorithm to clustering in large

Alexander Hinneburg; Daniel A. Keim

1998-01-01

333

The applicability and effectiveness of cluster analysis

NASA Technical Reports Server (NTRS)

An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

Ingram, D. S.; Actkinson, A. L.

1973-01-01

334

Online Software for Clustering

NSDL National Science Digital Library

This metasite provides informal reviews and links (mainly taken from electronic mailing lists and newsgroups) to clustering software that is free on the Internet. The software is accessible by anonymous FTP, Gopher, or World Wide Web. Examples of links annotated here include LVQ_PAK for Learning Vector Quantization algorithms, Tooldiag for the analysis and visualization of sensorial data, and Fixed Point Cluster Analysis. The site is maintained by Fionn Murtagh, Associate Professor of Astronomy at Louis Pasteur University's Strasbourg Observatory, France. This site is worth browsing by scientists interested in cluster analysis techniques for a variety of disciplines.

Murtagh, Fionn.

335

NASA Astrophysics Data System (ADS)

Infrared thermography has been used increasingly as an effective non-destructive technique to detect cracks on metal surface. Due to many factors, infrared thermal image has low definition compared to visible image. The contrasts between cracks and sound areas in different thermal image frames of a specimen vary greatly with the recorded time. An accurate detection can only be obtained by glancing over the whole thermal video, which is a laborious work. Moreover, experience of the operator has a great important influence on the accuracy of detection result. In this paper, an infrared thermal image processing framework based on superpixel algorithm is proposed to accomplish crack detection automatically. Two popular superpixel algorithms are compared and one of them is selected to generate superpixels in this application. Combined features of superpixels were selected from both the raw gray level image and the high-pass filtered image. Fuzzy c-means clustering is used to cluster superpixels in order to segment infrared thermal image. Experimental results show that the proposed framework can recognize cracks on metal surface through infrared thermal image automatically.

Xu, Changhang; Xie, Jing; Chen, Guoming; Huang, Weiping

2014-11-01

336

Diametrical clustering for identifying anti-correlated gene clusters

Motivation: Clustering genes based upon their expres- sion patterns allows us to predict gene function. Most existing clus- tering algorithms cluster genes together when their expression pat- terns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive ó genes responding to the

Inderjit S. Dhillon; Edward M. Marcotte; Usman Roshan

2003-01-01

337

Clustering Web Search Results Using Fuzzy Ants

Clustering Web Search Results Using Fuzzy Ants Steven Schockaert,* Martine De Cock, Chris Cornelis and Uncertainty Modelling Research Unit, Krijgslaan 281 (S9), B-9000 Gent, Belgium Algorithms for clustering Web existing approaches and illustrates how our algorithm can be applied to the problem of Web search results

Gent, Universiteit

338

A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7

We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.

Hao, Jiangang; /Fermilab; McKay, Timothy A.; /Michigan U.; Koester, Benjamin P.; /Chicago U.; Rykoff, Eli S.; /UC, Santa Barbara /LBL, Berkeley; Rozo, Eduardo; /Chicago U.; Annis, James; /Fermilab; Wechsler, Risa H.; /SLAC; Evrard, August; /Michigan U.; Siegel, Seth R.; /Michigan U.; Becker, Matthew; /Chicago U.; Busha, Michael; /SLAC; Gerdes, David; /Michigan U.; Johnston, David E.; /Fermilab; Sheldon, Erin; /Brookhaven

2011-08-22

339

Background Potentially inappropriate prescribing in older people is common in primary care and can result in increased morbidity, adverse drug events, hospitalizations and mortality. In Ireland, 36% of those aged 70 years or over received at least one potentially inappropriate medication, with an associated expenditure of over €45 million. The main objective of this study is to determine the effectiveness and acceptability of a complex, multifaceted intervention in reducing the level of potentially inappropriate prescribing in primary care. Methods/design This study is a pragmatic cluster randomized controlled trial, conducted in primary care (OPTI-SCRIPT trial), involving 22 practices (clusters) and 220 patients. Practices will be allocated to intervention or control arms using minimization, with intervention participants receiving a complex multifaceted intervention incorporating academic detailing, medicines review with web-based pharmaceutical treatment algorithms that provide recommended alternative treatment options, and tailored patient information leaflets. Control practices will deliver usual care and receive simple patient-level feedback on potentially inappropriate prescribing. Routinely collected national prescribing data will also be analyzed for nonparticipating practices, acting as a contemporary national control. The primary outcomes are the proportion of participant patients with potentially inappropriate prescribing and the mean number of potentially inappropriate prescriptions per patient. In addition, economic and qualitative evaluations will be conducted. Discussion This study will establish the effectiveness of a multifaceted intervention in reducing potentially inappropriate prescribing in older people in Irish primary care that is generalizable to countries with similar prescribing challenges. Trial registration Current controlled trials ISRCTN41694007 PMID:23497575

2013-01-01

340

Adding Semantics to Email Clustering

This paper presents a novel algorithm to cluster emails according to their contents and the sentence styles of their subject lines. In our algorithm, natural language processing techniques and frequent itemset mining techniques are utilized to automatically generate meaningful generalized sentence patterns (GSPs) from subjects of emails. Then we put forward a novel unsupervised approach which treats GSPs as pseudo

Hua Li; Dou Shen; Benyu Zhang; Zheng Chen; Qiang Yang

2006-01-01

341

Dynamic Cluster Management In Ad hoc Networks

Clustering in ad hoc networks provides significant support for implementation of QoS and security, by overcoming inherent network defi- ciencies (like lack of infrastructure etc.). In a clustered network architecture, the ad hoc network is divided into groups of nodes called clusters. The clus- ters are dynamically maintained and reconfigured using specific protocols and algorithms. In this paper, we describe

Puneet Sethi; Gautam Barua

342

Approximate clustering via the mountain method

We develop a simple and effective approach for approximate estimation of the cluster centers on the basis of the concept of a mountain function. We call the procedure the mountain method. It can be useful for obtaining the initial values of the clusters that are required by more complex cluster algorithms. It also can be used as a stand alone

R. R. Yager; D. P. Filev

1994-01-01

343

We study a system consisting of two different types of particles, having charge, equal to 1 or q interacting through a pure Coulomb potential and confined in a parabolic trap. The ground-state and metastable state configurations of the classical non-uniformly point-charge particles have been calculated using a new genetic algorithm-based approach. The geometrical structures and structural phase transitions found by

G. Kamieniarz; P. Sobczak

2009-01-01

344

Regionalization of watersheds by hybrid-cluster analysis

NASA Astrophysics Data System (ADS)

Regionalization methods are often used in hydrology for regional trend analysis and frequency analysis of floods, low flows and other variables. During the last two decades considerable effort has gone into analysis and development of regionalization procedures. However, as no single procedure has been demonstrated to yield universally acceptable results, several methods of regionalization are in use. In this paper, three hybrid-clustering algorithms, which use partitional clustering procedure to identify groups of similar catchments by refining the clusters derived from agglomerative hierarchical clustering algorithms, are investigated to determine their effectiveness in regionalization. The hierarchical clustering algorithms used are single linkage, complete linkage and Ward's algorithms, while the partitional clustering algorithm used is the K-means algorithm. The effectiveness of the hybrid-cluster analysis in regionalization is investigated by using data from watersheds in Indiana, USA. Further, four cluster validity indices, namely cophenetic correlation coefficient, average silhouette width, Dunn's index and Davies-Bouldin index are tested to determine their effectiveness in identifying optimal partition provided by the clustering algorithms. The regions given by the clustering algorithms are, in general, not statistically homogeneous. The hybrid-cluster analysis is found to be useful in minimizing the effort needed to identify homogeneous regions. The hybrid of Ward's and K-means algorithms is recommended for use. The hybrid method provides enough flexibility and it offers prospects for improvement in regionalization studies.

Ramachandra Rao, A.; Srinivas, V. V.

2006-03-01

345

Image segmentation using fuzzy LVQ clustering networks

NASA Technical Reports Server (NTRS)

In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.

Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.

1992-01-01

346

, the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices). Another widely used clustering algorithm with a similar scheme is the Linde-Buzo-Gray (LBG) algorithm

Ghosh, Joydeep

347

Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442

Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan

2014-01-01

348

When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952

Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond

2014-01-01

349

Nonparametric genetic clustering: comparison of validity indices

A variable-string-length genetic algorithm (GA) is used for developing a novel nonparametric clustering technique when the number of clusters is not fixed a-priori. Chromosomes in the same population may now have different lengths since they encode different number of clusters. The crossover operator is redefined to tackle the concept of variable string length. A cluster validity index is used as

Sanghamitra Bandyopadhyay; Ujjwal Maulik

2001-01-01

350

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the onÂline version of the star algorithm [JPR98, JPR99] as the clustering tool#cient algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do

Aslam, Javed

351

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus

Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the on-line version of the star algorithm [JPR98, JPR99] as the clustering tool algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do

Aslam, Javed

352

Determination of the volumes of acute cerebral infarct in the magnetic resonance imaging harbors prognostic values. However, semiautomatic method of segmentation is time-consuming and with high interrater variability. Using diffusion weighted imaging and apparent diffusion coefficient map from patients with acute infarction in 10 days, we aimed to develop a fully automatic algorithm to measure infarct volume. It includes an unsupervised classification with fuzzy C-means clustering determination of the histographic distribution, defining self-adjusted intensity thresholds. The proposed method attained high agreement with the semiautomatic method, with similarity index 89.9 ± 6.5%, in detecting cerebral infarct lesions from 22 acute stroke patients. We demonstrated the accuracy of the proposed computer-assisted prompt segmentation method, which appeared promising to replace the laborious, time-consuming, and operator-dependent semiautomatic segmentation. PMID:24738080

Tsai, Jang-Zern; Chen, Yu-Wei; Wang, Kuo-Wei; Wu, Hsiao-Kuang; Lin, Yun-Yu; Lee, Ying-Ying; Chen, Chi-Jen; Lin, Huey-Juan; Smith, Eric Edward; Hsin, Yue-Loong

2014-01-01

353

SUBTRACTION-FREE COMPLEXITY, CLUSTER TRANSFORMATIONS, AND SPANNING TREES

SUBTRACTION-FREE COMPLEXITY, CLUSTER TRANSFORMATIONS, AND SPANNING TREES SERGEY FOMIN, DIMA GRIGORIEV, AND GLEB KOSHEVOY Abstract. Subtraction-free computational complexity is the version use cluster transformations to design efficient subtraction-free algorithms for com- puting Schur

Grigoriev, Dima

354

Automatic subspace clustering of high dimensional data for data mining applications

Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces

Rakesh Agrawal; Johannes E. Gehrke; Dimitrios Gunopulos; Prabhakar Raghavan

1998-01-01

355

Genetic clustering for automatic evolution of clusters and application to image classification

In this article the searching capability of genetic algorithms has been exploited for automatically evolving the number of clusters as well as proper clustering of any data set. A new string representation, comprising both real numbers and the do not care symbol, is used in order to encode a variable number of clusters. The Davies–Bouldin index is used as a

Sanghamitra Bandyopadhyay; Ujjwal Maulik

2002-01-01

356

Cosmography with Galaxy Clusters

In the present work we focus on future experiments using cluster abundance observations to constraint the Dark Energy equation of state parameter, w. To obtain tight constraints from this kind of experiment, a reliable sample of galaxy clusters must be obtained from deep and wide-field images. We therefore present the computational environment (2DPHOT) that allow us to build the galaxy catalog from the images and the Voronoi Tessellation cluster finding algorithm that we use to identify the galaxy clusters on those catalogs. To test our pipeline with data similar in quality to what will be gathered by future wide field surveys, we process images from the Deep fields obtained as part of the LEGACY Survey (four fields of one square degree each, in five bands, with depth up to r'=25). We test our cluster finder by determining the completeness and purity of the finder when applied to mock galaxy catalogs made for the Dark Energy Survey cluster finder comparison project by Risa Wechsler and Michael Busha. This procedure aims to understand the selection function of the underlying dark matter halos.

M. Soares-Santos; R. R. de Carvalho; F. La Barbera; P. A. A. Lopes; J. Annis

2008-10-21

357

NASA Astrophysics Data System (ADS)

In typical case 2 waters an accurate remote sensing retrieval of chlorophyll a (chla) is still challenging. There is a widespread understanding that universally applicable water constituent retrieval algorithms are currently not feasible, shifting the research focus to regionally specific implementations of powerful inversion methods. This study takes advantage of regionally specific chlorophyll a (chla) algorithms, which were developed by the authors of this abstract in previous works, and the characteristics of Medium Resolution Imaging Spectrometer (MERIS) in order to study harmful algal events in the optically complex waters of the Galician Rias (NW). Harmful algal events are a frequent phenomenon in this area with direct and indirect impacts to the mussel production that constitute a very important economic activity for the local community. More than 240 106 kg of mussel per year are produced in these highly primary productive upwelling systems. A MERIS archive from nine years (2003-2012) was analysed using regionally specific chla algorithms. The latter were developed based on Multilayer perceptron (MLP) artificial neural networks and fuzzy c-mean clustering techniques (FCM). FCM specifies zones (based on water leaving reflectances) where the retrieval algorithms normally provide more reliable results. Monthly chla anomalies and other statistics were calculated for the nine years MERIS archive. These results were then related to upwelling indices and other associated measurements to determine the driver forces for specific phytoplankton blooms. The distribution and changes of chla are also discussed.

Gonzalez Vilas, L.; Castro Fernandez, M.; Spyrakos, E.; Torres Palenzuela, J.

2013-08-01

358

Clustering of financial time series

NASA Astrophysics Data System (ADS)

This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.

D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo

2013-05-01

359

We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.

Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.

2004-05-26

360

AMIC@: All MIcroarray Clusterings @ once.

The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica. PMID:18477631

Geraci, Filippo; Pellegrini, Marco; Renda, M Elena

2008-07-01

361

Performance of Energy Efficient Relaying for Cluster-Based Wireless Sensor Networks

This paper proposes a novel energy efficient data relaying scheme to improve energy efficiency for cluster-based wireless sensor networks (WSNs). In order to reduce the energy dissipation of transmitting sensing data at each sensor, the fixed clustering algorithm uniformly divides the sensing area into clusters where the cluster head is deployed to the centered of the cluster area. Moreover, to

Yung-Fa Huang; Ching-Mu Chen; Tsair-Rong Chen; Jong-Shin Chen; Neng-Chung Wang

2007-01-01

362

SMART: Unique Splitting-While-Merging Framework for Gene Clustering

Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159

Fa, Rui; Roberts, David J.; Nandi, Asoke K.

2014-01-01

363

SPCA Assisted Correlation Clustering of Hyperspectral Imagery

NASA Astrophysics Data System (ADS)

In this study, correlation clustering is introduced to hyperspectral imagery for unsupervised classification. The main advantage of correlation clustering lies in its ability to simultaneously perform feature reduction and clustering. This algorithm also allows selection of different sets of features for different clusters. This framework provides an effective way to address the issues associated with the high dimensionality of the data. ORCLUS, a correlation clustering algorithm, is implemented and enhanced by making use of segmented principal component analysis (SPCA) instead of principal component analysis (PCA). Further, original implementation of ORCLUS makes use of eigenvectors corresponding to smallest eigenvalues whereas in this study eigenvectors corresponding to maximum eigenvalues are used, as traditionally done when PCA is used as feature reduction tool. Experiments are conducted on three real hyperspectral images. Preliminary analysis of algorithms on real hyperspectral imagery shows ORCLUS is able to produce acceptable results.

Mehta, A.; Dikshit, O.

2014-11-01

364

Feature Clustering for Accelerating Parallel Coordinate Descent

We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.

Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.

2012-12-06

365

Clustering Methods for Real Estate Portfolios

A clustering algorithm is applied to effective rents for twenty-one metropolitan U.S. office markets, and to twenty-two metropolitan markets using vacancy data. It provides support for the conjecture that there exists a few major \\

William N. Goetzmann; Susan M. Wachter

1995-01-01

366

On evaluating clustering procedures for use in classification

NASA Technical Reports Server (NTRS)

The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.

Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (principal investigators)

1979-01-01

367

Clustering with Transitive Distance and K-Means Duality

Recent spectral clustering methods are a propular and powerful technique for data clustering. These methods need to solve the eigenproblem whose computational complexity is O(n3), where n is the number of data samples. In this paper, a non-eigenproblem based clustering method is proposed to deal with the clustering problem. Its performance is comparable to the spectral clustering algorithms but it

Chunjing Xu; Jianzhuang Liu; Xiaoou Tang

2007-01-01

368

Analyzing geographic clustered response

In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.

Merrill, D.W.; Selvin, S.; Mohr, M.S.

1991-08-01

369

NASA Technical Reports Server (NTRS)

Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

Wang, Lui; Bayer, Steven E.

1991-01-01

370

Using a Lagrangian-based approach, we present a more elegant derivation of the equations necessary for the variational optimization of the molecular orbitals (MOs) for the coupled-cluster doubles (CCD) method and second-order Møller-Plesset perturbation theory (MP2). These orbital-optimized theories are referred to as OO-CCD and OO-MP2 (or simply "OD" and "OMP2" for short), respectively. We also present an improved algorithm for orbital optimization in these methods. Explicit equations for response density matrices, the MO gradient, and the MO Hessian are reported both in spin-orbital and closed-shell spin-adapted forms. The Newton-Raphson algorithm is used for the optimization procedure using the MO gradient and Hessian. Further, orbital stability analyses are also carried out at correlated levels. The OD and OMP2 approaches are compared with the standard MP2, CCD, CCSD, and CCSD(T) methods. All these methods are applied to H(2)O, three diatomics, and the O(4)(+) molecule. Results demonstrate that the CCSD and OD methods give nearly identical results for H(2)O and diatomics; however, in symmetry-breaking problems as exemplified by O(4)(+), the OD method provides better results for vibrational frequencies. The OD method has further advantages over CCSD: its analytic gradients are easier to compute since there is no need to solve the coupled-perturbed equations for the orbital response, the computation of one-electron properties are easier because there is no response contribution to the particle density matrices, the variational optimized orbitals can be readily extended to allow inactive orbitals, it avoids spurious second-order poles in its response function, and its transition dipole moments are gauge invariant. The OMP2 has these same advantages over canonical MP2, making it promising for excited state properties via linear response theory. The quadratically convergent orbital-optimization procedure converges quickly for OMP2, and provides molecular properties that are somewhat different than those of MP2 for most of the test cases considered (although they are similar for H(2)O). Bond lengths are somewhat longer, and vibrational frequencies somewhat smaller, for OMP2 compared to MP2. In the difficult case of O(4)(+), results for several vibrational frequencies are significantly improved in going from MP2 to OMP2. PMID:21932872

Bozkaya, U?ur; Turney, Justin M; Yamaguchi, Yukio; Schaefer, Henry F; Sherrill, C David

2011-09-14

371

The Sloan Nearby Cluster Weak Lensing Survey

We describe and present initial results of a weak lensing survey of nearby (z {approx}< 0.1) galaxy clusters in the Sloan Digital Sky Survey (SDSS). In this first study, galaxy clusters are selected from the SDSS spectroscopic galaxy cluster catalogs of Miller et al. and Berlind et al. We report a total of seven individual low-redshift cluster weak lensing measurements that include A2048, A1767, A2244, A1066, A2199, and two clusters specifically identified with the C4 algorithm. Our program of weak lensing of nearby galaxy clusters in the SDSS will eventually reach {approx}200 clusters, making it the largest weak lensing survey of individual galaxy clusters to date.

Kubo, Jeffrey M.; /Fermilab; Annis, James T.; /Fermilab; Hardin, Frances Mei; /Illinois Math. Sci. Acad.; Kubik, Donna; /Fermilab; Lawhorn, Kelsey; /Illinois Math. Sci. Acad.; Lin, Huan; /Fermilab; Nicklaus, Liana; /Illinois Math. Sci. Acad.; Nelson, Dylan; /UC, Berkeley; Reis, Ribamar Rondon de Rezende; /Fermilab; Seo, Hee-Jong; /Fermilab; Soares-Santos, Marcelle; /Fermilab /Inst. Geo. Astron., Havana /Sao Paulo U. /Fermilab

2009-08-01

372

Pattern Clustering Using a Swarm Intelligence Approach

NASA Astrophysics Data System (ADS)

Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks, in these days, require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This, in turn, imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI) has recently emerged that meets these requirements and has successfully been applied to a number of real world clustering problems. This chapter explores the role of SI in clustering different kinds of datasets. It finally describes a new SI technique for partitioning a linearly non-separable dataset into an optimal number of clusters in the kernel- induced feature space. Computer simulations undertaken in this research have also been provided to demonstrate the effectiveness of the proposed algorithm.

Das, Swagatam; Abraham, Ajith

373

M-cluster and X-ray: Two Methods for Multi-Jammer Localization in Wireless Sensor Networks

M-cluster and X-ray: Two Methods for Multi-Jammer Localization in Wireless Sensor Networks Tianzhen algorithms: a multi-cluster localization (M-cluster) algorithm and an X-rayed jammed-area localization (X-ray-cluster and X-ray are efficient in localizing multiple jammers in a wireless sensor network with small errors

Zhu, Sencun

374

Acceleration of the LBG algorithm

A concentric spherical search technique is proposed to speed up the clustering process in VQ design. A linear data structure is incorporated into the LBG algorithm to keep and update the information about the proximity among the codewords. This proximity information can significantly reduce the number of candidate codewords to be the closest to a given training vector. An improved k-means type VQ design algorithm is proposed based on the new search technique and the supporting data structure. 10 refs.

Wu, Xiaolin; Guan, Lian [Univ. of Western Ontario, London, Ontario (Canada)

1994-02-01

375

A K-Hop Cluster Maintaining Mechanism for Mobile Ad Hoc Networks

The multi-hop clustering algorithms like Max-Min heuristic improve the scalability of mobile ad hoc networks compared to single-hop clustering algorithms. However, few papers focus on maintaining the stabilities of the multi-hop clusters. And multi-hop clusters without maintenance are prone to disruption due to mobility and large size against routing performance. We propose a k-hop cluster maintaining mechanism for mobile ad

Xufeng Ma

2011-01-01

376

An effective particle swarm optimization method for data clustering

Data clustering analysis is generally applied to image processing, customer relationship management and product family construction. This paper applied particle swarm optimization (PSO) algorithm on data clustering problems. Two reflex schemes are implemented on PSO algorithm to improve the efficiency. The proposed methods were tested on seven datasets, and their performance is compared with those of PSO, K-means and two

I. W. Kao; C. Y. Tsai; Y. C. Wang

2007-01-01

377

An automatic method for estimating the content of intramuscular fat (IMF) in beef M. longissimus dorsi (LD) was developed using a sequence of image processing algorithm. To extract IMF particles within the LD muscle from structural features of intermuscular fat surrounding the muscle, three steps of image processing algorithm were developed, i.e. bilateral filter for noise removal, kernel fuzzy c-means

Cheng-Jin Du; Da-Wen Sun; Patrick Jackman; Paul Allen

2008-01-01

378

Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache ... be related to the body's sudden release of histamine (chemical in the body released during an allergy ...

379

LEARNING DECISION RULES USING A DISTRIBUTED EVOLUTIONARY ALGORITHM

A new parallel method for learning decision rules from databases by using an evolutionary algorithm is proposed. We describe an implementation of EDRL-MD system in the cluster of multiprocessor machines connected by Fast Ethernet. Our approach consists in a distribution of the learning set into processors of the cluster. The evolutionary algorithm uses a master-slave model to compute the fitness

WOJCIECH KWEDLO; MAREK KR

380

Temporal event clustering for digital photo collections

We present similarity-based methods to cluster digital photos by time and image content. The approach is general, unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present results for the algorithm based solely on temporal similarity, and jointly on temporal and content-based similarity. We also describe a supervised algorithm based on learning vector quantization.

Matthew L. Cooper; Jonathan Foote; Andreas Girgensohn; Lynn Wilcox

2003-01-01

381

MSEEC - A Multi Search Engine with Multiple Clustering

This paper presents a scalable architecture for a multi search engine for web docu- ments with multiple cluster algorithms (MSEEC(12)). Querying search engines in the web may result in an overwhelming amount of matching documents. Clustering tech- niques are used to find a set of similar documents which are presented using a suitable cluster title. The scalable and modular architecture

Peter Hannappel; Reinhold Klapsing; Gustaf Neumann; Adrian Krug

1999-01-01

382

Constrained spectral clustering under a local proximity structure assumption

NASA Technical Reports Server (NTRS)

This work focuses on incorporating pairwise constraints into a spectral clustering algorithm. A new constrained spectral clustering method is proposed, as well as an active constraint acquisition technique and a heuristic for parameter selection. We demonstrate that our constrained spectral clustering method, CSC, works well when the data exhibits what we term local proximity structure.

Wagstaff, Kiri; Xu, Qianjun; des Jardins, Marie

2005-01-01

383

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning

Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau1 of very large data sets. We define a very large data set as a data set which will not fit into memory at once. Many clustering algorithms require that the data set be scanned many times during the clustering

Boley, Daniel

384

Multiscale iterative LBG clustering for SIMO channel identification

This paper deals with the problem of channel identification for single input multiple output (SIMO) slow fading channels using clustering algorithms. The received data vectors of the SIMO model are spread in clusters because of the AWGN. Each cluster is centered around the ideal channel output labels without noise. Starting from the Markov SIMO channel model, simultaneous maximum-likelihood estimation of

Fred Daneshgaran; Massimiliano Laddomada

2002-01-01

385

R\\/BHC: fast Bayesian hierarchical clustering for microarray data

BACKGROUND: Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. RESULTS: We present an R\\/Bioconductor port of a fast novel algorithm for Bayesian agglomerative hierarchical clustering and demonstrate its use in clustering gene expression

Richard S. Savage; Katherine A. Heller; Yang Xu; Zoubin Ghahramani; William M. Truman; Murray Grant; Katherine J. Denby; David L. Wild

2009-01-01

386

Gradient-based SOM clustering and visualisation methods

Data clustering has been a major research and application topic in data mining. The self-organizing map (SOM) has been widely applied to tasks including multivariate data visualization and clustering. SOM not only quantizes the input data but also enables visual display of data, a property that does not exist in most clustering algorithms. In the past decade many developments have

J. A. F. Costa; Hujun Yin

2010-01-01

387

When is Constrained Clustering Beneficial, and Why?

NASA Technical Reports Server (NTRS)

Several researchers have shown that constraints can improve the results of a variety of clustering algorithms. However, there can be a large variation in this improvement, even for a fixed number of constraints for a given data set. We present the first attempt to provide insight into this phenomenon by characterizing two constraint set properties: informativeness and coherence. We show that these measures can help explain why some constraint sets are more beneficial to clustering algorithms than others. Since they can be computed prior to clustering, these measures can aid in deciding which constraints to use in practice.

Wagstaff, Kiri L.; Basu, Sugato; Davidson, Ian

2006-01-01

388

Algorithms for Gene Clustering Analysis on Genomes

The increased availability of data in biological databases provides many opportunities for understanding biological processes through these data. As recent attention has shifted from sequence analysis to higher-level analysis of genes across...

Yi, Gang Man

2012-07-16

389

Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques

ERIC Educational Resources Information Center

This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…

Luan, Jing

2004-01-01

390

Gene teams: a new formalization of gene clusters for comparative genomics

This paper describes an efficient algorithm based on a new concept called gene team for detecting conserved gene clusters among an arbitrary number of chromosomes. Within the clusters, neither the order of the genes nor their orientation need be conserved. In addition, insertion of foreign genes within the clusters are permitted to a user-defined extent. This algorithm has been implemented

Nicolas Luc; Jean-loup Risler; Anne Bergeron; Mathieu Raffinot

2003-01-01

391

Genetic algorithms (GAs) are search methods based on principles of natural selection and genetics (Fraser, 1957;Bremermann, 1958;Holland, 1975). We start with a brief introduction to simple genetic algorithms and associated terminology.

Kumara Sastry; David Goldberg; Graham Kendall

392

Acceleration of the LBG algorithm

A concentric spherical search technique is proposed to speed up the clustering process in VQ design. A linear data structure is incorporated into the LBG algorithm to keep and update the information about the proximity among the codewords. This proximity information can significantly reduce the number of candidate codewords to be the closest to a given training vector. An improved

Xiaolin Wu; Lian Guan

1994-01-01

393

Algorithm Engineering is concerned with the design, analysis, implementation, tun- ing, debugging and experimental evaluation of computer programs for solving algorithmic problems. It provides methodologies and tools for developing and engineering efficient al- gorithmic codes and aims at integrating and reinforcing traditional theoretical approaches for the design and analysis of algorithms and data structures.

Camil Demetrescu; Irene FinocchiGiuseppe; F. Italianok

394

NSDL National Science Digital Library

CSC 325. (MAT 325) Numerical Algorithms (3) Prerequisite: CSC 112 or 121, MAT 162. An introduction to the numerical algorithms fundamental to scientific computer work. Includes elementary discussion of error, polynomial interpolation, quadrature, linear systems of equations, solution of nonlinear equations and numerical solution of ordinary differential equations. The algorithmic approach and the efficient use of the computer are emphasized.

Tagliarini, Gene

2003-04-21

395

clustering. : : 4 2 (a) is a Gantt chart. (b) is the scheduled graph derived from the Gantt chart. : : : 5 3: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 5 The Gantt chart and scheduled DAG for the optimum clustering algorithm of the join set

Yang, Tao

396

Watershed-based unsupervised clustering M. Bicego, M. Cristani, A. Fusiello, and V. Murino

Watershed-based unsupervised clustering M. Bicego, M. Cristani, A. Fusiello, and V. Murino clustering algorithm is presented, based on the watershed algorithm. The proposed approach defines a density is then performed using the well-known watershed algorithm, paying particular attention to the boundary situations

Cristani, Marco

397

Open Clusters versus Globular Clusters

NSDL National Science Digital Library

In this activity, students will describe similarities and differences between galactic star clusters and globular clusters. This is activity five in "The Hidden Lives of Galaxies" information and activity booklet. The booklet includes student worksheets and background information for the teacher.

398

Bipartite graph partitioning and data clustering

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.

Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.

2001-05-07

399

An evolutionary and visual framework for clustering of DNA microarray data.

This paper presents a case study to show the competence of our evolutionary and visual framework for cluster analysis of DNA microarray data. The proposed framework joins a genetic algorithm for hierarchical clustering with a set of visual components of cluster tasks given by a tool. The cluster visualization tool allows us to display different views of clustering results as a means of cluster visual validation. The results of the genetic algorithm for clustering have shown that it can find better solutions than the other methods for the selected data set. Thus, this shows the reliability of the proposed framework. PMID:24231146

Castellanos-Garzón, José A; Díaz, Fernando

2013-01-01

400

NASA Astrophysics Data System (ADS)

There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8? on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations of galaxy clusters will be at locations of the peaks in the true underlying (mostly) dark matter density field. Kaiser (1984) [19] called this the high-peak model, which we demonstrate in Figure 16.1. We show a two-dimensional representation of a density field created by summing plane-waves with a predetermined power and with random wave-vector directions. In the left panel, we plot only the largest modes, where we see the density peaks (black) and valleys (white) in the combined field. In the right panel, we allow for smaller modes. You can see that the highest density peaks in the left panel contain smaller-scale, but still high-density peaks. These are the locations of future galaxy clusters. The bottom panel shows just these cluster-scale peaks. As you can see, the peaks themselves are clustered, and instead of just one large high-density peak in the original density field (see the left panel), the smaller modes show that six peaks are "born" within the broader, underlying large-scale density modes. This exemplifies the "bias" or amplified structure that is traced by galaxy clusters [19]. Clusters are rare, easy to find, and their member galaxies provide good distance estimates. In combination with their amplified clustering signal described above, galaxy clusters are considered an efficient and precise tracer of the large-scale matter density field in the Universe. Galaxy clusters can also be used to measure the baryon content of the Universe [43]. They can be used to identify gravitational lenses [38] and map the distribution of matter in clusters. The number and spatial distribution of galaxy clusters can be used to constrain cosmological parameters, like the fraction of the energy density in the Universe due to matter (Omega_matter) or the variation in the density field on fixed physical scales (sigma_8) [26,33]. The individual clusters act as “Island Universes” and as such are laboratories here we can study the evolution of the properties of the cluster, like the hot, gaseous intra-cluster medium or shapes, colors, and star-

Miller, Christopher J. Miller

2012-03-01

401

Dynamics of Clusters in Two-dimensional Potts Model

Dynamical behavior of the clusters during relaxation is studied in two-dimensional Potts model using cluster algorithm. Average cluster size and cluster formation velocity are calculated on two different lattice sizes for different number of states during initial stages of the Monte Carlo simulation. Dependence of these quantities on the order of the transition provides an efficient method to study nature of the phase transitions occuring in similar models.

Yigit Gunduc; Meral Aydin

1996-05-11

402

Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.

Donchev, Todor I. (Urbana, IL); Petrov, Ivan G. (Champaign, IL)

2011-05-31

403

DNA Microarray Data Clustering Based on Temporal Variation: FCV with TSD Preclustering

such as k-means or hierarchical clustering. The algorithm called fuzzy c- varieties clustering, fuzzy c-varieties clustering, Saccharomyces cerevisiae microarray data Running Head: Clustering short and economics (Mitchell and Mulherin 1996), speech recognition (Tran and Wagner 2002, Oates 1999) and medicine

Rostock, UniversitÃ¤t

404

Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in

Xuan-Hong Dang; James Bailey

2010-01-01

405

Previous studies have been conducted in gene expression profiling to identify groups of genes that characterize the colorectal carcinoma disease. Despite the success of previous attempts to identify groups of genes in the progression of the colorectal carcinoma disease, their methods either require subjective interpretation of the number of clusters, or lack stability during different runs of the algorithms. All of which limits the usefulness of these methods. In this study, we propose an enhanced algorithm that provides stability and robustness in identifying differentially expressed genes in an expression profile analysis. Our proposed algorithm uses multiple clustering algorithms under the consensus clustering framework. The results of the experiment show that the robustness of our method provides a consistent structure of clusters, similar to the structure found in the previous study. Furthermore, our algorithm outperforms any single clustering algorithms in terms of the cluster quality score. PMID:21738330

Wahyudi, Gatot; Wasito, Ito; Melia, Tisha; Budi, Indra

2011-01-01

406

Previous studies have been conducted in gene expression profiling to identify groups of genes that characterize the colorectal carcinoma disease. Despite the success of previous attempts to identify groups of genes in the progression of the colorectal carcinoma disease, their methods either require subjective interpretation of the number of clusters, or lack stability during different runs of the algorithms. All of which limits the usefulness of these methods. In this study, we propose an enhanced algorithm that provides stability and robustness in identifying differentially expressed genes in an expression profile analysis. Our proposed algorithm uses multiple clustering algorithms under the consensus clustering framework. The results of the experiment show that the robustness of our method provides a consistent structure of clusters, similar to the structure found in the previous study. Furthermore, our algorithm outperforms any single clustering algorithms in terms of the cluster quality score. PMID:21738330

Wahyudi, Gatot; Wasito, Ito; Melia, Tisha; Budi, Indra

2011-01-01

407

Complementary ensemble clustering of biomedical data

The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities. PMID:23454721

Fodeh, Samah Jamal; Brandt, Cynthia; Luong, Thai Binh; Haddad, Ali; Schultz, Martin; Murphy, Terrence; Krauthammer, Michael

2013-01-01

408

Automated variable weighting in k-means type clustering.

This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data. PMID:15875789

Huang, Joshua Zhexue; Ng, Michael K; Rong, Hongqiang; Li, Zichen

2005-05-01

409

Hyperspectral image lossless compression algorithm based on adaptive band regrouping

Hyperspectral image has weak spatial correlation and strong spectral correlation. As to exploit spectrum redundancy sufficiently, it must be pre-processed. In this paper, a new algorithm for lossless compression of hyperspectral images based on adaptive band regrouping is proposed. Firstly, the affinity propagation clustering algorithm (AP) is chosen for band regrouping according to interband correlation. Then a linear prediction algorithm

Mingyi He; Lin Bai; Yuchao Dai; Jing Zhang

2009-01-01

410

MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS

Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...

411

Pipelining Architecture of Indexing Using Agglomerative Clustering

NASA Astrophysics Data System (ADS)

The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.

Goyal, Deepika; Goyal, Deepti; Gupta, Parul

2010-11-01

412

Fast and effective text mining using linear-time document clustering

Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in high-dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. We describe an unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase.

Bjornar Larsen; Chinatsu Aone

1999-01-01

413

In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. PMID:25233483

Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold

2014-12-01

414

Star Clusters Sterrenstelsels & Kosmos

Star Clusters Sterrenstelsels & Kosmos deel 2 1 #12;Types of star clusters 2 #12;Open or Galactic Clusters Â· "Open" or Galactic clusters are low mass, relatively small (~10 pc diameter) clusters of stars in the Galactic disk containing stars Â· The Pleiades cluster is a good example of an open cluster

Weijgaert, Rien van de

415

The Fate of Dwarf Galaxies in Clusters and the Origin of Intracluster Stars. I. Isolated Clusters

The main goal of this paper is to compare the relative importance of destruction by tides, vs. destruction by mergers, in order to assess if tidal destruction of dwarf galaxies in clusters is a viable scenario for explaining the origin of intracluster stars. We have designed a simple algorithm for simulating the evolution of isolated clusters. The distribution of galaxies in the cluster is evolved using a direct gravitational N-body algorithm combined with a subgrid treatment of physical processes such as mergers, tidal disruption, and galaxy harassment. Using this algorithm, we have performed a total of 227 simulations. Our main results are (1) destruction of dwarf galaxies by mergers dominates over destruction by tides, and (2) the destruction of dwarf galaxies by tides is sufficient to explain the observed intracluster light in clusters.

Paramita Barai; William Brito; Hugo Martel

2008-07-02

416

An approach for improving K-means algorithm on market segmentation

The K-means algorithm is among the most popular clustering methods that group observations with similar characteristics or features together. It is widely used in many marketing applications, especially in cluster-based market segmentation. The K-means algorithm is implemented by different commercial software, such as SAS, SPSS and MATLAB, as a standard clustering function\\/tool. This note compares the performances of K-means algorithm

Haibo Wang; Da Huo; Jun Huang; Yaquan Xu; Lixia Yan; Wei Sun; Xianglu Li

2010-01-01

417

NASA Astrophysics Data System (ADS)

We report on results of recent, high resolution hydrodynamic simulations of the formation and evolution of X-ray clusters of galaxies carried out within a cosmological framework. We employ the highly accurate piecewise parabolic method (PPM) on fixed and adaptive meshes which allow us to resolve the flow field in the intracluster gas. The excellent shock capturing and low numerical viscosity of PPM represent a substantial advance over previous studies using SPH. We find that in flat, hierarchical cosmological models, the ICM is in a turbulent state long after turbulence generated by the last major merger should have decayed away. Turbulent velocites are found to vary slowly with cluster radius, being $\\sim 25%$ of $\\sigma_{vir}$ in the core, increasing to $\\sim 60%$ at the virial radius. We argue that more frequent minor mergers maintain the high level of turbulence found in the core where dynamical times are short. Turbulent pressure support is thus significant throughout the cluster, and results in a somewhat cooler cluster ($T/T_{vir} \\sim .8$) for its mass. Some implications of cluster turbulence are discussed.

Norman, Michael L.; Bryan, Greg L.

418

Auto-Clustering Using Particle Swarm Optimization and Bacterial Foraging

NASA Astrophysics Data System (ADS)

This paper presents a hybrid approach for clustering based on particle swarm optimization (PSO) and bacteria foraging algorithms (BFA). The new method AutoCPB (Auto-Clustering based on particle bacterial foraging) makes use of autonomous agents whose primary objective is to cluster chunks of data by using simplistic collaboration. Inspired by the advances in clustering using particle swarm optimization, we suggest further improvements. Moreover, we gathered standard benchmark datasets and compared our new approach against the standard K-means algorithm, obtaining promising results. Our hybrid mechanism outperforms earlier PSO-based approaches by using simplistic communication between agents.

Olesen, Jakob R.; Cordero H., Jorge; Zeng, Yifeng

419

Global optimization method using SLE and adaptive RBF based on fuzzy clustering

NASA Astrophysics Data System (ADS)

High fidelity analysis models, which are beneficial to improving the design quality, have been more and more widely utilized in the modern engineering design optimization problems. However, the high fidelity analysis models are so computationally expensive that the time required in design optimization is usually unacceptable. In order to improve the efficiency of optimization involving high fidelity analysis models, the optimization efficiency can be upgraded through applying surrogates to approximate the computationally expensive models, which can greately reduce the computation time. An efficient heuristic global optimization method using adaptive radial basis function (RBF) based on fuzzy clustering (ARFC) is proposed. In this method, a novel algorithm of maximin Latin hypercube design using successive local enumeration (SLE) is employed to obtain sample points with good performance in both space-filling and projective uniformity properties, which does a great deal of good to metamodels accuracy. RBF method is adopted for constructing the metamodels, and with the increasing the number of sample points the approximation accuracy of RBF is gradually enhanced. The fuzzy c-means clustering method is applied to identify the reduced attractive regions in the original design space. The numerical benchmark examples are used for validating the performance of ARFC. The results demonstrates that for most application examples the global optima are effectively obtained and comparison with adaptive response surface method (ARSM) proves that the proposed method can intuitively capture promising design regions and can efficiently identify the global or near-global design optimum. This method improves the efficiency and global convergence of the optimization problems, and gives a new optimization strategy for engineering design optimization problems involving computationally expensive models.

Zhu, Huaguang; Liu, Li; Long, Teng; Zhao, Junfeng

2012-07-01

420

Reactive Collision Avoidance Algorithm

NASA Technical Reports Server (NTRS)

The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on-line. The optimal avoidance trajectory is implemented as a receding-horizon model predictive control law. Therefore, at each time step, the optimal avoidance trajectory is found and the first time step of its acceleration is applied. At the next time step of the control computer, the problem is re-solved and the new first time step is again applied. This continual updating allows the RCA algorithm to adapt to a colliding spacecraft that is making erratic course changes.

Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred

2010-01-01

421

Improving clustering with metabolic pathway data

Background It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis. The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom. PMID:24717120

2014-01-01

422

Structural and energetic properties of sodium clusters

NASA Astrophysics Data System (ADS)

In this work we present results from a theoretical study on the properties of sodium clusters. The structures of the global total-energy minima have been determined using two different methods. With the parameterized density-functional tight-binding method (DFTB) combined with a genetic-algorithm we investigated the properties of NaN clusters with cluster size up to 20 atoms, and with our own Aufbau/Abbau algorithm together with the embedded-atom method (EAM) up to 60 atoms. The two sets of results from the independent calculations are compared and a stability function is studied as function of the cluster size. Due to the electronic effects included in the DFTB method and the packing effects included in the EAM we have obtained different global-minima structures and different stability functions.

Tevekeliyska, V.; Dong, Y.; Springborg, M.; Grigoryan, V. G.

2007-07-01

423

Clustering with Missing Values: No Imputation Required

NASA Technical Reports Server (NTRS)

Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.

Wagstaff, Kiri

2004-01-01

424

Dynamically weighted clustering with noise set

Motivation: Various clustering methods have been applied to microarray gene expression data for identifying genes with similar expression profiles. As the biological annotation data accumulated, more and more genes have been organized into functional categories. Functionally related genes may be regulated by common cellular signals, thus likely to be co-expressed. Consequently, utilizing the rapidly increasing functional annotation resources such as Gene Ontology (GO) to improve the performance of clustering methods is of great interest. On the opposite side of clustering, there are genes that have distinct expression profiles and do not co-express with other genes. Identification of these scattered genes could enhance the performance of clustering methods. Results: We developed a new clustering algorithm, Dynamically Weighted Clustering with Noise set (DWCN), which makes use of gene annotation information and allows for a set of scattered genes, the noise set, to be left out of the main clusters. We tested the DWCN method and contrasted its results with those obtained using several common clustering techniques on a simulated dataset as well as on two public datasets: the Stanford yeast cell-cycle gene expression data, and a gene expression dataset for a group of genetically different yeast segregants. Conclusion: Our method produces clusters with more consistent functional annotations and more coherent expression patterns than existing clustering techniques. Contact: yshen@stat.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20007256

Shen, Yijing; Sun, Wei; Li, Ker-Chau

2010-01-01

425

Segmentation and enhancement of digital copies using a new fuzzy clustering method

NASA Astrophysics Data System (ADS)

In this paper, we introduce a new system to segment and label document images into text, halftoned images, and background using a modified fuzzy c-means (FCM) algorithm. Each pixel is assigned a feature vector, extracted from edge information and gray level distribution. The feature pattern is then assigned to a specific region using the modified fuzzy c-means approach. In the process of minimizing the new objective function, the neighborhood effect acts as a regularizer and biases the solution towards piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by scanner noise.

Ahmed, Mohamed Nooman; Cooper, Brian E.

2006-02-01

426

Collaborative Clustering for Sensor Networks

NASA Technical Reports Server (NTRS)

Traditionally, nodes in a sensor network simply collect data and then pass it on to a centralized node that archives, distributes, and possibly analyzes the data. However, analysis at the individual nodes could enable faster detection of anomalies or other interesting events, as well as faster responses such as sending out alerts or increasing the data collection rate. There is an additional opportunity for increased performance if individual nodes can communicate directly with their neighbors. Previously, a method was developed by which machine learning classification algorithms could collaborate to achieve high performance autonomously (without requiring human intervention). This method worked for supervised learning algorithms, in which labeled data is used to train models. The learners collaborated by exchanging labels describing the data. The new advance enables clustering algorithms, which do not use labeled data, to also collaborate. This is achieved by defining a new language for collaboration that uses pair-wise constraints to encode useful information for other learners. These constraints specify that two items must, or cannot, be placed into the same cluster. Previous work has shown that clustering with these constraints (in isolation) already improves performance. In the problem formulation, each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. Each learner clusters its data and then selects a pair of items about which it is uncertain and uses them to query its neighbors. The resulting feedback (a must and cannot constraint from each neighbor) is combined by the learner into a consensus constraint, and it then reclusters its data while incorporating the new constraint. A strategy was also proposed for cleaning the resulting constraint sets, which may contain conflicting constraints; this improves performance significantly. This approach has been applied to collaborative clustering of seismic and infrasonic data collected by the Mount Erebus Volcano Observatory in Antarctica. Previous approaches to distributed clustering cannot readily be applied in a sensor network setting, because they assume that each node has the same view of the data set. A view is the set of features used to represent each object. When a single data set is partitioned across several computational nodes, distributed clustering works; all objects have the same view. But when the data is collected from different locations, using different sensors, a more flexible approach is needed. This approach instead operates in situations where the data collected at each node has a different view (e.g., seismic vs. infrasonic sensors), but they observe the same events. This enables them to exchange information about the likely cluster membership relations between objects, even if they do not use the same features to represent the objects.

Wagstaff. Loro :/; Green Jillian; Lane, Terran

2011-01-01

427

NSDL National Science Digital Library

In this 4-minute video, educators can watch a teacher deliver a fourth grade lesson on using number sense, arrays, and simpler calculations to solve a more complex problem, called a cluster problem. Watch this teacher check for understanding of multiplicative distribution and the use of arrays.

TeacherLine

2012-01-01

428

Online Spectral Clustering on Network Streams

.1.2.2 Dirichlet Process Mixture Model . . . . . . . . . . . . . . . 94 6.1.3 Bayesian Network Structure Inference with Text Priors . . . . . . . . 95 6.1.3.1 Sampling of Topic Trees. . . . . . . . . . . . . . . . . . . . . 97 6.1.3.2 Sampling of Bayesian Network...]. The typical agglomerative algorithms are the modularity measure [199] based approaches [71, 56, 73]. It was showed in the work of Frivanek et al. [159] that if the levels of dendrogram, the hierarchical cluster tree, is more than 3, all these clustering...

Jia, Yi

2012-12-31

429

Finding gene clusters for a replicated time course study

Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656

2014-01-01

430

Fast approximate hierarchical clustering using similarity heuristics

Background Agglomerative hierarchical clustering (AHC) is a common unsupervised data analysis technique used in several biological applications. Standard AHC methods require that all pairwise distances between data objects must be known. With ever-increasing data sizes this quadratic complexity poses problems that cannot be overcome by simply waiting for faster computers. Results We propose an approximate AHC algorithm HappieClust which can output a biologically meaningful clustering of a large dataset more than an order of magnitude faster than full AHC algorithms. The key to the algorithm is to limit the number of calculated pairwise distances to a carefully chosen subset of all possible distances. We choose distances using a similarity heuristic based on a small set of pivot objects. The heuristic efficiently finds pairs of similar objects and these help to mimic the greedy choices of full AHC. Quality of approximate AHC as compared to full AHC is studied with three measures. The first measure evaluates the global quality of the achieved clustering, while the second compares biological relevance using enrichment of biological functions in every subtree of the clusterings. The third measure studies how well the contents of subtrees are conserved between the clusterings. Conclusion The HappieClust algorithm is well suited for large-scale gene expression visualization and analysis both on personal computers as well as public online web applications. The software is available from the URL PMID:18822115

Kull, Meelis; Vilo, Jaak

2008-01-01

431

FEMA: A Fast Expectation Maximization Algorithm based on Grid and PCA

EM algorithm is an important unsupervised clustering algo- rithm, but the algorithm has several limitations. In this paper, we propose a fast EM algorithm (FEMA) to address the limitations of EM and enhance its efficiency. FEMA achieves low running time by combining principal component analysis(PCA), a grid cell ex- pansion algorithm(GCEA) and a hierarchical cluster tree. PCA and multi-dimensional grid

Zhiwen Yu; Hau-san Wong

2006-01-01

432

K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

The K-means algorithm is a commonly used technique in cluster analysis. In this paper, several questions about the algorithm are addressed. The clustering problem is first cast as a nonconvex mathematical program. Then, a rigorous proof of the finite convergence of the K-means-type algorithm is given for any metric. It is shown that under certain conditions the algorithm may fail

Shokri Z. Selim; M. A. Ismail

1984-01-01

433

Classical algorithms from theoretical computer science arise time and again in practice. However,a practical situations typically do not fit precisely into the traditional theoretical models. Additional necessary components ...

Nikolova, Evdokia Velinova

2009-01-01

434

Functionally related groups of neurons spatially cluster together in the brain. To detect groups of functionally related neurons from 3D histological data, we developed an objective clustering method that provides a description of detected cell clusters that is quantitative and amenable to visual exploration. This method is based on bubble clustering (Gupta and Gosh, 2008). Our implementation consists of three steps: (i) an initial data exploration for scanning the clustering parameter space; (ii) determination of the optimal clustering parameters; (iii) final clustering. We designed this algorithm to flexibly detect clusters without assumptions about the underlying cell distribution within a cluster or the number and sizes of clusters. We implemented the clustering function as an integral part of the neuroanatomical data visualization software Virtual RatBrain (http://www.virtualratbrain.org). We applied this algorithm to the basal forebrain cholinergic system, which consists of a diffuse but inhomogeneous population of neurons (Zaborszky, 1992). With this clustering method, we confirmed the inhomogeneity in this system, defined cell clusters, quantified and localized them, and determined the cell density within clusters. Furthermore, by applying the clustering method to multiple specimens from both rat and monkey, we found that cholinergic clusters display remarkable cross-species preservation of cell density within clusters. This method is efficient not only for clustering cell body distributions but may also be used to study other distributed neuronal structural elements, including synapses, receptors, dendritic spines and molecular markers. PMID:20398701

Nadasdy, Zoltan; Varsanyi, Peter; Zaborszky, Laszlo

2010-01-01

435

This paper discusses automated scheduling as it applies to complex domains such as factories, transportation, and communications systems. The window-constrained-packing problem is introduced as an ideal model of the scheduling trade offs. Specific algorithms are compared in terms of simplicity, speed, and accuracy. In particular, dispatch, look-ahead, and genetic algorithms are statistically compared on randomly generated job sets. The conclusion

William J. Wolfe; David Wood; Steve Sorensen

1996-01-01

436

Spatio-Temporal Clustering of Monitoring Network

NASA Astrophysics Data System (ADS)

Pakistan has much diversity in seasonal variation of different locations. Some areas are in desserts and remain very hot and waterless, for example coastal areas are situated along the Arabian Sea and have very warm season and a little rainfall. Some areas are covered with mountains, have very low temperature and heavy rainfall; for instance Karakoram ranges. The most important variables that have an impact on the climate are temperature, precipitation, humidity, wind speed and elevation. Furthermore, it is hard to find homogeneous regions in Pakistan with respect to climate variation. Identification of homogeneous regions in Pakistan can be useful in many aspects. It can be helpful for prediction of the climate in the sub-regions and for optimizing the number of monitoring sites. In the earlier literature no one tried to identify homogeneous regions of Pakistan with respect to climate variation. There are only a few papers about spatio-temporal clustering of monitoring network. Steinhaus (1956) presented the well-known K-means clustering method. It can identify a predefined number of clusters by iteratively assigning centriods to clusters based. Castro et al. (1997) developed a genetic heuristic algorithm to solve medoids based clustering. Their method is based on genetic recombination upon random assorting recombination. The suggested method is appropriate for clustering the attributes which have genetic characteristics. Sap and Awan (2005) presented a robust weighted kernel K-means algorithm incorporating spatial constraints for clustering climate data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data. Soltani and Modarres (2006) used hierarchical and divisive cluster analysis to categorize patterns of rainfall in Iran. They only considered rainfall at twenty-eight monitoring sites and concluded that eight clusters existed. Soltani and Modarres (2006) classified the sites by using only average rainfall of sites, they did not consider time replications and spatial coordinates. Kerby et.al (2007) purposed spatial clustering method based on likelihood. They took account of the geographic locations through the variance covariance matrix. Their purposed method works like hierarchical clustering methods. Moreovere, it is inappropiriate for time replication data and could not perform well for large number of sites. Tuia.et.al (2008) used scan statistics for identifying spatio-temporal clusters for fire sequences in the Tuscany region in Italy. The scan statistics clustering method was developed by Kulldorff et al. (1997) to detect spatio-temporal clusters in epidemiology and assessing their significance. The purposed scan statistics method is used only for univariate discrete stochastic random variables. In this paper we make use of a very simple approach for spatio-temporal clustering which can create separable and homogeneous clusters. Most of the clustering methods are based on Euclidean distances. It is well known that geographic coordinates are spherical coordinates and estimating Euclidean distances from spherical coordinates is inappropriate. As a transformation from geographic coordinates to rectangular (D-plane) coordinates we use the Lambert projection method. The partition around medoids clustering method is incorporated on the data including D-plane coordinates. Ordinary kriging is taken as validity measure for the precipitation data. The kriging results for clusters are more accurate and have less variation compared to complete monitoring network precipitation data. References Casto.V.E and Murray.A.T (1997). Spatial Clustering with Data Mining with Genetic Algorithms. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.8573 Kaufman.L and Rousseeuw.P.J (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley series of Probability and Mathematical Statistics, New York. Kulldorf.M (1997). A spatial scan statistic. Commun. Stat.-Theor. Math. 26(6)

Hussain, I.; Pilz, J.

2009-04-01

437

NASA Astrophysics Data System (ADS)

The next generation of telescopes will acquire terabytes of image data on a nightly basis. Collectively, these large images will contain billions of interesting objects, which astronomers call sources . The astronomers' task is to construct a catalog detailing the coordinates and other properties of the sources. The source catalog is the primary data product for most telescopes and is an important input for testing new astrophysical theories, but to construct the catalog one must first detect the sources. Existing algorithms for catalog creation are effective at detecting sources, but do not have rigorous statistical error control. At the same time, there are several multiple testing procedures that provide rigorous error control, but they are not designed to detect sources that are aggregated over several pixels. We propose a family of techniques that do both, by providing rigorous statistical error control on the aggregate objects themselves rather than the pixels. We demonstrate the effectiveness of this approach on data from the Chandra X-ray Observatory Satellite. Our techniques effectively controls the rate of false sources, yet still detect almost all of the sources detected by procedures that do not have such rigorous error control and have the advantage of additional data in the form of follow up observations, which may not be available for upcoming large telescopes. In fact, we even detect two new sources that were missed by previous studies. The statistical methods we develop can be extended to problems beyond Astronomy, as we will illustrate with examples from Neuroimaging. We examine a series of high-resolution function Magnetic Resonance Imaging (fMRI) experiments in which the goal is to detect bands of neural activity in response to visual stimuli presented to subjects in an fMRI scanner. We extend the methods developed for Astronomy problems so that we can detect two distinct types of activation regions in the brain with a probabilistic guarantee on the rate of falsely detected active regions. Additionally we examine the more general field of clustering and develop a framework for clustering algorithms based around diffusion maps. Diffusion maps can be used to project high-dimensional data into a lower dimensional space while preserving much of the structure in the data. We demonstrate how diffusion maps can be used to solve clustering problems and examine the influence of tuning parameters on the results. We introduce two novel methods, the self-tuning diffusion map which replaces the global scaling parameter in the typical diffusion map framework with a local scaling parameter and an algorithm for automatically selecting tuning parameters based on a cross-validation style score called prediction strength. The methods are tested on several example datasets.

Friedenberg, David

2010-10-01

438

Knowledge Driven Dimension Reduction For Clustering Ian Davidson

Knowledge Driven Dimension Reduction For Clustering Ian Davidson Department of Computer Science University of California - Davis davidson@cs.ucdavis.edu Abstract As A.I. algorithms are applied to more

Davidson, Ian

439

Segmentation of clustered nuclei with shape markers and marking function

We present a method to separate clustered nuclei from fluorescence microscopy cellular images, using shape markers and marking function in a watershed-like algorithm. Shape markers are extracted using an adaptive H-minima ...

Rajapakse, Jagath