An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
NASA Astrophysics Data System (ADS)
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Sequential Competitive Learning and the Fuzzy c-Means Clustering Algorithms.
Hathaway, Richard J.; Bezdek, James C.; Pal, Nikhil R.
1996-07-01
Several recent papers have described sequential competitive learning algorithms that are curious hybrids of algorithms used to optimize the fuzzy c-means (FCM) and learning vector quantization (LVQ) models. First, we show that these hybrids do not optimize the FCM functional. Then we show that the gradient descent conditions they use are not necessary conditions for optimization of a sequential version of the FCM functional. We give a numerical example that demonstrates some weaknesses of the sequential scheme proposed by Chung and Lee. And finally, we explain why these algorithms may work at times, by exhibiting the stochastic approximation problem that they unknowingly attempt to solve. Copyright 1996 Published by Elsevier Science Ltd PMID:12662563
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
NASA Astrophysics Data System (ADS)
Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute
2014-05-01
Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c-means cluster analyses. Geophysics 71(3), H33-H44, doi:10.1190/1.2192927.
Generalized rough fuzzy c-means algorithm for brain MR image segmentation.
Ji, Zexuan; Sun, Quansen; Xia, Yong; Chen, Qiang; Xia, Deshen; Feng, Dagan
2012-11-01
Fuzzy sets and rough sets have been widely used in many clustering algorithms for medical image segmentation, and have recently been combined together to better deal with the uncertainty implied in observed image data. Despite of their wide spread applications, traditional hybrid approaches are sensitive to the empirical weighting parameters and random initialization, and hence may produce less accurate results. In this paper, a novel hybrid clustering approach, namely the generalized rough fuzzy c-means (GRFCM) algorithm is proposed for brain MR image segmentation. In this algorithm, each cluster is characterized by three automatically determined rough-fuzzy regions, and accordingly the membership of each pixel is estimated with respect to the region it locates. The importance of each region is balanced by a weighting parameter, and the bias field in MR images is modeled by a linear combination of orthogonal polynomials. The weighting parameter estimation and bias field correction have been incorporated into the iterative clustering process. Our algorithm has been compared to the existing rough c-means and hybrid clustering algorithms in both synthetic and clinical brain MR images. Experimental results demonstrate that the proposed algorithm is more robust to the initialization, noise, and bias field, and can produce more accurate and reliable segmentations. PMID:22088865
NASA Astrophysics Data System (ADS)
Akinin, M. V.; Akinina, N. V.; Klochkov, A. Y.; Nikiforov, M. B.; Sokolova, A. V.
2015-05-01
The report reviewed the algorithm fuzzy c-means, performs image segmentation, give an estimate of the quality of his work on the criterion of Xie-Beni, contain the results of experimental studies of the algorithm in the context of solving the problem of drawing up detailed two-dimensional maps with the use of unmanned aerial vehicles. According to the results of the experiment concluded that the possibility of applying the algorithm in problems of decoding images obtained as a result of aerial photography. The considered algorithm can significantly break the original image into a plurality of segments (clusters) in a relatively short period of time, which is achieved by modification of the original k-means algorithm to work in a fuzzy task.
Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering
Freeman, Walter J.
Mandarin Digital Speech Recognition Based on a Chaotic Neural Network and Fuzzy C-means Clustering model can perform digital speech recognition efficiently and the fuzzy c-means clustering has better performance than the hard k-means clustering. I. INTRODUCTION Digital speech recognition can be widely used
NASA Astrophysics Data System (ADS)
Kentel, Elcin
2010-05-01
Estimating future river flows is essential in water resources planning and management. Artificial neural network (ANN) models have been extensively utilized for rainfall-runoff modeling in the last decade. One of the major weaknesses of artificial neural network models is that they may fail to generate good estimates for extreme events, i.e. events that do not occur at all or often enough in the training set. If reliable estimates can be distinguished from unreliable ones, the former can be used with greater confidence in planning and management of the water resources. A fuzzy c-means algorithm is developed to cluster the estimates of the artificial neural networks into reliable and less-reliable river flow values (Kentel, 2009). The proposed algorithm is only tested for a single case (i.e. Güvenç River, Turkey) and produced promising results. In this study, applicability of the fuzzy c-means algorithm for different catchments in Turkey is tested. Three flow gauging stations are selected at four different catchments in mid and south Turkey. First, an ANN is developed for each gauging station; then fuzzy c-means algorithm is used together with the outputs of ANN models to test the success of the clustering algorithm in identifying input vectors that are susceptible to produce unreliable estimates. Results obtained for 12 gauging stations are used to identify the drawbacks of fuzzy c-means algorithm and to suggest modifications to improve the algorithm. Key words: Future river flow estimation; Artificial Neural Network; fuzzy c-means clustering Kentel, E. (2009) "Estimation of River Flow by Artificial Neural Networks and Identification of Input Vectors Susceptible to Producing Unreliable Flow Estimates," Journal of Hydrology, 375, 481-488.
A hybrid model for bankruptcy prediction using genetic algorithm, fuzzy c-means and mars
Martin, A; Saranya, G; Gayathri, P; Venkatesan, Prasanna
2011-01-01
Bankruptcy prediction is very important for all the organization since it affects the economy and rise many social problems with high costs. There are large number of techniques have been developed to predict the bankruptcy, which helps the decision makers such as investors and financial analysts. One of the bankruptcy prediction models is the hybrid model using Fuzzy C-means clustering and MARS, which uses static ratios taken from the bank financial statements for prediction, which has its own theoretical advantages. The performance of existing bankruptcy model can be improved by selecting the best features dynamically depend on the nature of the firm. This dynamic selection can be accomplished by Genetic Algorithm and it improves the performance of prediction model.
A Modified Fuzzy C-Means Algorithm For Collaborative LMAM and School of Mathematical Sciences,
Li, Tiejun
to the Netflix Prize data set and acquire comparable accuracy with that of MF. Categories and Subject Descriptors Keywords Collaborative Filtering, Clustering, Matrix Factorization, Fuzzy C-means, Netflix Prize 1 on servers or to redistribute to lists, requires prior specific permission and/or a fee. 2nd Netflix
Inhomogeneity correction for magnetic resonance images with fuzzy C-mean algorithm
Inhomogeneity correction for magnetic resonance images with fuzzy C-mean algorithm Xiang Li of New York at Stony Brook, USA Abstract: Segmentation of magnetic resonance (MR) images plays field, Magnetic Resonance Imaging. I. Introduction Magnetic Resonance Imaging (MRI) has several
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet-based EFCM clustering model was introduced toward noise reduction and detail preservation. The proposed method improves the overall US image quality, which in turn could affect the decision-making on whether additional imaging and/or intervention is needed.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
NASA Astrophysics Data System (ADS)
Kesiko?lu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Despotovi?, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried
2010-03-01
The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.
NASA Astrophysics Data System (ADS)
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW-SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures in the field in order to assist well planning. Results revealed that the high impedance location probability related to an area in the vicinity of the faults, whilst low impedance location probably could indicate zones of high permeability which indicate flow conduits. Analysis under the present study suggests that the orientation and magnitude of fractures exhibiting the main trend of NW-SE in Dezful Embayment is more susceptible to stimulation and is more likely to open for fluid flow.
Creating Personalised Energy Plans: From Groups to Individuals using Fuzzy C Means Clustering
Aickelin, Uwe
, and the impact of climate change on altering electricity demand and the greater occurrence of extreme weather- hold is assigned to one cluster/group/stereotypical profile but, in reality, may show aspects of many
Hassan, Mehdi; Chaudhry, Asmatullah; Khan, Asifullah; Iftikhar, M Aksam
2014-02-01
In this paper, a robust method is proposed for segmentation of medical images by exploiting the concept of information gain. Medical images contain inherent noise due to imaging equipment, operating environment and patient movement during image acquisition. A robust medical image segmentation technique is thus inevitable for accurate results in subsequent stages. The clustering technique proposed in this work updates fuzzy membership values and cluster centroids based on information gain computed from the local neighborhood of a pixel. The proposed approach is less sensitive to noise and produces homogeneous clustering. Experiments are performed on medical and non-medical images and results are compared with state of the art segmentation approaches. Analysis of visual and quantitative results verifies that the proposed approach outperforms other techniques both on noisy and noise free images. Furthermore, the proposed technique is used to segment a dataset of 300 real carotid artery ultrasound images. A decision system for plaque detection in the carotid artery is then proposed. Intima media thickness (IMT) is measured from the segmented images produced by the proposed approach. A feature vector based on IMT values is constructed for making decision about the presence of plaque in carotid artery using probabilistic neural network (PNN). The proposed decision system detects plaque in carotid artery images with high accuracy. Finally, effect of the proposed segmentation technique has also been investigated on classification of carotid artery ultrasound images. PMID:24239296
Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.
de Luis Balaguer, Maria A; Williams, Cranos M
2014-08-01
Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal. PMID:24196983
Wang, Junbai; Bø, Trond Hellem; Jonassen, Inge; Myklebost, Ola; Hovig, Eivind
2003-01-01
Background Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). Results The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. Conclusions Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used. PMID:14651757
Tang, Jing Rui; Mat Isa, Nor Ashidi; Ch’ng, Ewe Seng
2015-01-01
Despite the effectiveness of Pap-smear test in reducing the mortality rate due to cervical cancer, the criteria of the reporting standard of the Pap-smear test are mostly qualitative in nature. This study addresses the issue on how to define the criteria in a more quantitative and definite term. A negative Pap-smear test result, i.e. negative for intraepithelial lesion or malignancy (NILM), is qualitatively defined to have evenly distributed, finely granular chromatin in the nuclei of cervical squamous cells. To quantify this chromatin pattern, this study employed Fuzzy C-Means clustering as the segmentation technique, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. From the simulation results, a model representing the chromatin distribution of non-neoplastic cervical squamous cell is constructed with the following quantitative characteristics: at the best representative sensitivity level 4 based on statistical analysis and human experts’ feedbacks, a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827?m2; the average distance between the nearest chromatin pair is 0.508?m and the average eccentricity of the chromatin is 0.47. PMID:26560331
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a final dense tissue segmentation that is used to compute breast PD%. Our method is validated on a group of 81 women for whom bilateral, mediolateral oblique, raw and processed screening digital mammograms were available, and agreement is assessed with both continuous and categorical density estimates made by a trained breast-imaging radiologist. Results: Strong association between algorithm-estimated and radiologist-provided breast PD% was detected for both raw (r= 0.82, p < 0.001) and processed (r= 0.85, p < 0.001) digital mammograms on a per-breast basis. Stronger agreement was found when overall breast density was assessed on a per-woman basis for both raw (r= 0.85, p < 0.001) and processed (0.89, p < 0.001) mammograms. Strong agreement between categorical density estimates was also seen (weighted Cohen's {kappa}{>=} 0.79). Repeated measures analysis of variance demonstrated no statistically significant differences between the PD% estimates (p > 0.1) due to either presentation of the image (raw vs processed) or method of PD% assessment (radiologist vs algorithm). Conclusions: The proposed fully automated algorithm was successful in estimating breast percent density from both raw and processed digital mammographic images. Accurate assessment of a woman's breast density is critical in order for the estimate to be incorporated into risk assessment models. These results show promise for the clinical application of the algorithm in quantifying breast density in a repeatable manner, both at time of imaging as well as in retrospective studies.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Lin, Thy-Hou; Wang, Ging-Ming; Hsu, Yao-Hua
2002-01-01
A fuzzy c-means algorithm was used to classify some 3D convex hull descriptors computed for 345 active HIV-1 protease inhibitors collected from literature and 437 inactive analogues searched from the MDL/ISIS database. The number of descriptors used to represent each compound was from 4 to 8, and they were uncorrelated using the principal component analysis. These uncorrelated descriptors were then divided into two groups and classified by the fuzzy c-means algorithm. The classification produced a clear-cut switch in membership functions computed for each uncorrelated descriptor at the group boundary. Compounds with nonswitching membership functions computed were treated as outliers, and they were counted for estimating the accuracy of the classification. The averaged accuracy of classification for the active inhibitor set was about 80% which was better than that directly classified by a linear discriminant function on the original 3D convex hull descriptors. The whole classification scheme was also applied to several sets of some conventional descriptors computed for each compound, but the averaged accuracy was around 58%. Further classification using some 3D convex hull descriptors searched from comparing the distribution of these descriptors was performed on a new data set composed of 289 outliers-deducted active inhibitors and 63 outliers identified from the inactive analogues through previous classification. This final classification identified 19 inactive analogues which were similar in structural and topological features to those of some highly active inhibitors classified together with them. PMID:12444748
Shi, Jiazheng; Sahiner, Berkman; Chan, Heang-Ping; Paramagul, Chintana; Hadjiiski, Lubomir M.; Helvie, Mark; Chenevert, Thomas
2009-01-01
The goal of this study was to develop an automated method to segment breast masses on dynamic contrast-enhanced (DCE) magnetic resonance (MR) scans and to evaluate its potential for estimating tumor volume on pre- and postchemotherapy images and tumor change in response to treatment. A radiologist experienced in interpreting breast MR scans defined a cuboid volume of interest (VOI) enclosing the mass in the MR volume at one time point within the sequence of DCE-MR scans. The corresponding VOIs over the entire time sequence were then automatically extracted. A new 3D VOI representing the local pharmacokinetic activities in the VOI was generated from the 4D VOI sequence by summarizing the temporal intensity enhancement curve of each voxel with its standard deviation. The method then used the fuzzy c-means (FCM) clustering algorithm followed by morphological filtering for initial mass segmentation. The initial segmentation was refined by the 3D level set (LS) method. The velocity field of the LS method was formulated in terms of the mean curvature which guaranteed the smoothness of the surface, the Sobel edge information which attracted the zero LS to the desired mass margin, and the FCM membership function which improved segmentation accuracy. The method was evaluated on 50 DCE-MR scans of 25 patients who underwent neoadjuvant chemotherapy. Each patient had pre- and postchemotherapy DCE-MR scans on a 1.5 T magnet. The in-plane pixel size ranged from 0.546 to 0.703 mm and the slice thickness ranged from 2.5 to 4.5 mm. The flip angle was 15°, repetition time ranged from 5.98 to 6.7 ms, and echo time ranged from 1.2 to 1.3 ms. Computer segmentation was applied to the coronal T1-weighted images. For comparison, the same radiologist who marked the VOI also manually segmented the mass on each slice. The performance of the automated method was quantified using an overlap measure, defined as the ratio of the intersection of the computer and the manual segmentation volumes to the manual segmentation volume. Pre- and postchemotherapy masses had overlap measures of 0.81±0.13 (mean±s.d.) and 0.71±0.22, respectively. The percentage volume reduction (PVR) estimated by computer and the radiologist were 55.5±43.0% (mean±s.d.) and 57.8±51.3%, respectively. Paired Student’s t test indicated that the difference between the mean PVRs estimated by computer and the radiologist did not reach statistical significance (p=0.641). The automated mass segmentation method may have the potential to assist physicians in monitoring volume change in breast masses in response to treatment. PMID:19994516
Kumar, Surendra; Ghosh, Subhojit; Tetarway, Suhash; Sinha, Rakesh Kumar
2015-07-01
In this study, the magnitude and spatial distribution of frequency spectrum in the resting electroencephalogram (EEG) were examined to address the problem of detecting alcoholism in the cerebral motor cortex. The EEG signals were recorded from chronic alcoholic conditions (n = 20) and the control group (n = 20). Data were taken from motor cortex region and divided into five sub-bands (delta, theta, alpha, beta-1 and beta-2). Three methodologies were adopted for feature extraction: (1) absolute power, (2) relative power and (3) peak power frequency. The dimension of the extracted features is reduced by linear discrimination analysis and classified by support vector machine (SVM) and fuzzy C-mean clustering. The maximum classification accuracy (88 %) with SVM clustering was achieved with the EEG spectral features with absolute power frequency on F4 channel. Among the bands, relatively higher classification accuracy was found over theta band and beta-2 band in most of the channels when computed with the EEG features of relative power. Electrodes wise CZ, C3 and P4 were having more alteration. Considering the good classification accuracy obtained by SVM with relative band power features in most of the EEG channels of motor cortex, it can be suggested that the noninvasive automated online diagnostic system for the chronic alcoholic condition can be developed with the help of EEG signals. PMID:25773367
Sokouti, Babak; Rezvan, Farshad; Dastmalchi, Siavoush
2015-08-01
G protein-coupled receptors (GPCRs) constitute the largest superfamily of integral membrane proteins (IMPs) and they tremendously contribute in the flow of information into cells. In this study, the random forest (RF) and the subtractive fuzzy c-means clustering (SBC) methods have been used to determine the importance of input variables and discriminate GPCRs from non-GPCRs using twenty amino acid and fifty pseudo amino acid compositions derived from GPCR sequences. The studied dataset was retrieved from the UniProt/SWISSPROT database and consists of 1000 GPCR and 1000 non-GPCR reviewed sequences. The top ranked RF-SBC-based model discriminates GPCRs and non-GPCRs successfully with the accuracy, sensitivity, specificity and Matthew's coefficient correlation (MCC) rates of 99.15%, 99.60%, 98.70% and 0.983%, respectively. These rates were obtained from averaged values of 5-fold cross validation using only twenty four out of fifty pseudo amino acid composition features. The results show that the proposed RF-SBC-based model outperforms other existing algorithms in terms of the evaluated performance criteria. The webserver for the proposed algorithm is available at http://brcinfo.shinyapps.io/GPCRIden. PMID:26108102
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
NASA Astrophysics Data System (ADS)
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue; Department of Radiology, University of Michigan, 1500 East Medical Center Drive, Med Inn Building C478, Ann Arbor, Michigan 48109-5842; Department of Biomedical Engineering, University of Michigan, 2200 Bonisteel Boulevard, Ann Arbor, Michigan 48109-2099
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.
Nested Cluster Algorithm for Frustrated Quantum Antiferromagnets
M. Nyfeler; F. -J. Jiang; F. Kämpfer; U. -J. Wiese
2008-03-25
Simulations of frustrated quantum antiferromagnets suffer from a severe sign problem. We solve the ergodicity problem of the loop-cluster algorithm in a natural way and apply a powerful strategy to address the sign problem. For the spin 1/2 Heisenberg antiferromagnet on a kagome and on a frustrated square lattice, a nested cluster algorithm eliminates the sign problem for large systems. The method is applicable to general lattice geometries but limited to moderate temperatures.
Generalized cluster algorithms for frustrated spin models
Coddington, P.D. ); Han, L. )
1994-08-01
Standard Monte Carlo cluster algorithms have proven to be very effective for many different spin models. However, they fail for frustrated spin systems. Recently, a generalized cluster algorithm was introduced that works extremely well for the fully frustrated Ising model on a square lattice by placing bonds between sites based on information from plaquettes rather than links of the lattice. Here we study some properties of this algorithm and some variants of it. We introduce a practical methodology for constructing a generalized cluster algorithm for a given spin model, and apply this method to some other frustrated Ising models. We find that such algorithms work well for simple fully frustrated Ising models in two dimensions, but appear to work poorly or not at all for more complex models such as spin glasses.
A local distribution based spatial clustering algorithm
NASA Astrophysics Data System (ADS)
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential spatial association rules and outliers among the spatial data. Most existing spatial clustering algorithms only utilize the spatial distance or local density to find the spatial clusters in a spatial database, without taking the spatial local distribution characters into account, so that the clustered results are unreasonable in many cases. To overcome such limitations, this paper develops a new indicator (i.e. local median angle) to measure the local distribution at first, and further proposes a new algorithm, called local distribution based spatial clustering algorithm (LDBSC in abbreviation). In the process of spatial clustering, a series of recursive search are implemented for all the entities so that those entities with its local median angle being very close or equal are clustered. In this way, all the spatial entities in the spatial database can be automatically divided into some clusters. Finally, two tests are implemented to demonstrate that the method proposed in this paper is more prominent than DBSCAN, as well as that it is very robust and feasible, and can be used to find the clusters with different shapes.
Research a New Dynamic Clustering Algorithm Based on Genetic Immunity Mechanism
NASA Astrophysics Data System (ADS)
Xu, Yuhui; Jiang, Weijin
A novel dynamic evolutionary clustering algorithm is proposed in this paper to overcome the shortcomings of fuzzy modeling method based on general clustering algorithms that fuzzy rule number should be determined beforehand. This algorithm searches for the optimal cluster number by using the improved genetic techniques to optimize string lengths of chromosomes; at the same time, the convergence of clustering center parameters is expedited with the help of Fuzzy C-Means algorithm. Moreover, by introducing memory function and vaccine inoculation mechanism of immune system, at the same time, dynamic evolutionary clustering algorithm can converge to the optimal solution rapidly and stably. The proper fuzzy rule number and exact premise parameters are obtained simultaneously when using this efficient dynamic evolutionary clustering algorithm to identify fuzzy models. The effectiveness of the proposed fuzzy modeling method based on dynamic evolutionary clustering algorithm is demonstrated by simulation examples, and the accurate non-linear fuzzy models can be obtained when the method is applied to the thermal processes.
Hierarchical link clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (principal investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
CLU: A new algorithm for EST clustering
Ptitsyn, Andrey; Hide, Winston
2005-01-01
Background The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. Results We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. Conclusion CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from PMID:16026600
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
Kumar, Vipin
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling George Karypis EuiHong (Sam, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON clusters. The methodology of dynamic modeling of clusters used in CHAMELEON is applicable to all types
A Recursive Algorithm for Spatial Cluster Detection
Jiang, Xia; Cooper, Gregory F.
2007-01-01
Spatial cluster detection involves finding spatial subregions of some larger region where clusters of some event are occurring. For example, in the case of disease outbreak detection, we want to find clusters of disease cases so as to pinpoint where the outbreak is occurring. When doing spatial cluster detection, we must first articulate the subregions of the region being analyzed. A simple approach is to represent the entire region by an n · n grid. Then we let every subset of cells in the grid represent a subregion. With this representation, the number of subregions is equal to 2n2 ?1. If n is not small, it is intractable to check every subregion. The time complexity of checking all the subregions that are rectangles is ?(n4). Neill et al.8 performed Bayesian spatial cluster detection by only checking every rectangle. In the current paper, we develop a recursive algorithm which searches a richer set of subregions. We provide results of simulation experiments evaluating the detection power and accuracy of the algorithm. PMID:18693860
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Nested cluster algorithm for frustrated quantum antiferromagnets.
Nyfeler, M; Jiang, F-J; Kämpfer, F; Wiese, U-J
2008-06-20
Frustrated antiferromagnets are important materials whose quantum Monte Carlo simulation suffers from a severe sign problem. We construct a nested cluster algorithm which uses a powerful strategy to address this problem. For the spin 1/2 Heisenberg antiferromagnet on a kagome and on a frustrated square lattice the sign problem is eliminated for large systems. The method is applicable to general lattice geometries but limited to moderate temperatures. PMID:18643626
Hearing the clusters in a graph: A distributed algorithm
Sahai, Tuhin; Banaszuk, Andrzej
2009-01-01
We propose a novel distributed algorithm to decompose graphs or cluster data. The algorithm recovers the solution obtained from spectral clustering without need for expensive eigenvalue/ eigenvector computations. We demonstrate that by solving the wave equation on the graph, every node can assign itself to a cluster by performing a local fast Fourier transform. We prove the equivalence of our algorithm to spectral clustering, derive convergence rates and demonstrate it on examples.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
MRI brain tumor segmentation based on improved fuzzy c-means method
NASA Astrophysics Data System (ADS)
Deng, Wankai; Xiao, Wei; Pan, Chao; Liu, Jianguo
2009-10-01
This paper focuses on the image segmentation, which is one of the key problems in medical image processing. A new medical image segmentation method is proposed based on fuzzy c- means algorithm and spatial information. Firstly, we classify the image into the region of interest and background using fuzzy c means algorithm. Then we use the information of the tissues' gradient and the intensity inhomogeneities of regions to improve the quality of segmentation. The sum of the mean variance in the region and the reciprocal of the mean gradient along the edge of the region are chosen as an objective function. The minimum of the sum is optimum result. The result shows that the clustering segmentation algorithm is effective.
"Improved FCM algorithm for Clustering on Web Usage Mining"
Suresh, K
2011-01-01
In this paper we present clustering method is very sensitive to the initial center values, requirements on the data set too high, and cannot handle noisy data the proposal method is using information entropy to initialize the cluster centers and introduce weighting parameters to adjust the location of cluster centers and noise problems.The navigation datasets which are sequential in nature, Clustering web data is finding the groups which share common interests and behavior by analyzing the data collected in the web servers, this improves clustering on web data efficiently using improved fuzzy c-means(FCM) clustering. Web usage mining is the application of data mining techniques to web log data repositories. It is used in finding the user access patterns from web access log. Web data Clusters are formed using on MSNBC web navigation dataset.
A Novel Clustering Algorithm Inspired by Membrane Computing
Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
DISC-Finder: A Distributed Algorithm for Identifying Galaxy Clusters Friends-of-Friends Algorithm
Fink, Eugene
DISC-Finder: A Distributed Algorithm for Identifying Galaxy Clusters SVD10 Friends-cube "friendships" and merge the respective clusters, using the union-find algorithm Performance 0.5 bln galaxies 8 sampling of galaxies splittingthe spaceinto cubes partitioning thegalaxies by cubes clustering withincubes
Hybridizing Evolutionary Algorithms and Clustering Algorithms to Find Source-Code Clones
Maletic, Jonathan I.
Hybridizing Evolutionary Algorithms and Clustering Algorithms to Find Source-Code Clones Andrew a hybrid approach to detect source-code clones that combines evolutionary algorithms and clustering. A case is effective in detecting groups of source-code clones. Categories and Subject Descriptors D.2.7 [Software
Incremental Clustering Algorithm For Earth Science Data Mining
Vatsavai, Raju
2009-01-01
Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
A NEW CLUSTERING ALGORITHM FOR SEGMENTATION OF MAGNETIC RESONANCE IMAGES
Slatton, Clint
A NEW CLUSTERING ALGORITHM FOR SEGMENTATION OF MAGNETIC RESONANCE IMAGES By ERHAN GOKCAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Magnetic Resonance Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Image Formation in MRI
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine. PMID:26317743
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
DYNAMICAL ANALYSIS OF LOW TEMPERATURE MONTE CARLO CLUSTER ALGORITHMS
DYNAMICAL ANALYSIS OF LOW TEMPERATURE MONTE CARLO CLUSTER ALGORITHMS by Fabio Martinelli Abstract that greatly hampered Monte Carlo simulations of critical phenomena in ferromagnetic systems of statistical like plane rotators [9] or completely frustrated systems [10]. This type of stochastic algorithms
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
Cluster algorithms with empahsis on quantum spin systems
Gubernatis, J.E.; Kawashima, Naoki
1995-10-06
The purpose of this lecture is to discuss in detail the generalized approach of Kawashima and Gubernatis for the construction of cluster algorithms. We first present a brief refresher on the Monte Carlo method, describe the Swendsen-Wang algorithm, show how this algorithm follows from the Fortuin-Kastelyn transformation, and re=interpret this transformation in a form which is the basis of the generalized approach. We then derive the essential equations of the generalized approach. This derivation is remarkably simple if done from the viewpoint of probability theory, and the essential assumptions will be clearly stated. These assumptions are implicit in all useful cluster algorithms of which we are aware. They lead to a quite different perspective on cluster algorithms than found in the seminal works and in Ising model applications. Next, we illustrate how the generalized approach leads to a cluster algorithm for world-line quantum Monte Carlo simulations of Heisenberg models with S = 1/2. More succinctly, we also discuss the generalization of the Fortuin- Kasetelyn transformation to higher spin models and illustrate the essential steps for a S = 1 Heisenberg model. Finally, we summarize how to go beyond S = 1 to a general spin, XYZ model.
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; ?ochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Development of clustering algorithms for Compressed Baryonic Matter experiment
NASA Astrophysics Data System (ADS)
Kozlov, G. E.; Ivanov, V. V.; Lebedev, A. A.; Vassiliev, Yu. O.
2015-05-01
A clustering problem for the coordinate detectors in the Compressed Baryonic Matter (CBM) experiment is discussed. Because of the high interaction rate and huge datasets to be dealt with, clustering algorithms are required to be fast and efficient and capable of processing events with high track multiplicity. At present there are two different approaches to the problem. In the first one each fired pad bears information about its charge, while in the second one a pad can or cannot be fired, thus rendering the separation of overlapping clusters a difficult task. To deal with the latter, two different clustering algorithms were developed, integrated into the CBMROOT software environment, and tested with various types of simulated events. Both of them are found to be highly efficient and accurate.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
The walking cluster algorithm Internal Report
genes. Based on the hypothesis that coexpressed genes are more likely to be coregulated, restricting, Belgium e-mail: frank.desmet@esat.kuleuven.ac.be 1 #12; Abstract Grouping or clustering genes based of the interaction between these genes. Based on the hypothesis that similarity in expression implies similarity
A gauge invariant cluster algorithm for the Ising spin glass
K. Langfeld; M. Quandt; W. Lutz; H. Reinhardt
2006-06-14
The frustrated Ising model in two dimensions is revisited. The frustration is quantified in terms of the number of non-trivial plaquettes which is invariant under the Nishimori gauge symmetry. The exact ground state energy is calculated using Edmond's algorithm. A novel cluster algorithm is designed which treats gauge equivalent spin glasses on equal footing and allows for efficient simulations near criticality. As a first application, the specific heat near criticality is investigated.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we exclude clusters embedded in complex large-scale environments, we find that the velocity dispersion of the remaining clusters is as good an estimator of M{sub 200} as L{sub r}. The final C4 catalog will contain {approx_equal} 2500 clusters using the full SDSS data set and will represent one of the largest and most homogeneous samples of local clusters.
CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Shamir, Ron
of biological datasets, ranging from gene expression, cDNA oligo-#12;ngerprinting to protein sequence similarCLICK: A Clustering Algorithm with Applications to Gene Expression Analysis Roded Sharan and Ron,shamirg@math.tau.ac.il Abstract Novel DNA microarray technologies enable the mon- itoring of expression levels of thousands
Clustering Algorithms Optimizer: A Framework for Large Datasets
Horn, David
tissues, associating tissues with subtypes of a disease) as well as revealing functional classes of genesClustering Algorithms Optimizer: A Framework for Large Datasets Roy Varshavsky1,* , David Horn2 bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data
An algorithmic approach to mining unknown clusters in training data
NASA Astrophysics Data System (ADS)
Lynch, Robert S., Jr.; Willett, Peter K.
2006-04-01
In this paper, unsupervised learning is utilized to develop a method for mining unknown clusters in training data. The approach is based on the Bayesian Data Reduction Algorithm (BDRA), which has recently been developed into a patented system called the Data Extraction and Mining Software Tool (DEMIST). In the BDRA, the modeling assumption is that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach to selecting and discretizing the relevant features of each class for best performance. The primary metric for selecting and discretizing all relevant features contained in each class is an analytic formula for the probability of error conditioned on the training data. Thus, the primary contribution of this work is to demonstrate an algorithmic approach to finding multiple unknown clusters in training data, which represents an extension to the original data clustering algorithm. To illustrate performance, results are demonstrated using simulated data that contains multiple clusters. In general, the results of this work will demonstrate an effective method for finding multiple clusters in data mining applications.
Mekhmoukh, Abdenour; Mokrani, Karim
2015-11-01
In this paper, a new image segmentation method based on Particle Swarm Optimization (PSO) and outlier rejection combined with level set is proposed. A traditional approach to the segmentation of Magnetic Resonance (MR) images is the Fuzzy C-Means (FCM) clustering algorithm. The membership function of this conventional algorithm is sensitive to the outlier and does not integrate the spatial information in the image. The algorithm is very sensitive to noise and in-homogeneities in the image, moreover, it depends on cluster centers initialization. To improve the outlier rejection and to reduce the noise sensitivity of conventional FCM clustering algorithm, a novel extended FCM algorithm for image segmentation is presented. In general, in the FCM algorithm the initial cluster centers are chosen randomly, with the help of PSO algorithm the clusters centers are chosen optimally. Our algorithm takes also into consideration the spatial neighborhood information. These a priori are used in the cost function to be optimized. For MR images, the resulting fuzzy clustering is used to set the initial level set contour. The results confirm the effectiveness of the proposed algorithm. PMID:26299609
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
ORCA: The Overdense Red-sequence Cluster Algorithm
Murphy, D N A; Bower, R G
2011-01-01
We present a new cluster detection algorithm designed for the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) survey but with generic application to any multiband data. The method makes no prior assumptions about the properties of clusters other than (a) the similarity in colour of cluster galaxies (the "red sequence") and (b) an enhanced projected surface density. The detector has three main steps: (i) it identifies cluster members by photometrically filtering the input catalogue to isolate galaxies in colour-magnitude space, (ii) a Voronoi diagram identifies regions of high surface density, (iii) galaxies are grouped into clusters with a Friends-of-Friends technique. Where multiple colours are available, we require systems to exhibit sequences in two colours. In this paper we present the algorithm and demonstrate it on two datasets. The first is a 7 square degree sample of the deep Sloan Digital Sky Survey equatorial stripe (Stripe 82), from which we detect 97 clusters with z10^13 solar ma...
Clustering online social network communities using genetic algorithms
Hajeer, Mustafa H; Dasgupta, Dipankar; Sanyal, Sugata
2013-01-01
To analyze the activities in an Online Social network (OSN), we introduce the concept of "Node of Attraction" (NoA) which represents the most active node in a network community. This NoA is identified as the origin/initiator of a post/communication which attracted other nodes and formed a cluster at any point in time. In this research, a genetic algorithm (GA) is used as a data mining method where the main objective is to determine clusters of network communities in a given OSN dataset. This approach is efficient in handling different type of discussion topics in our studied OSN - comments, emails, chat expressions, etc. and can form clusters according to one or more topics. We believe that this work can be useful in finding the source for spread of this GA-based clustering of online interactions and reports some results of experiments with real-world data and demonstrates the performance of proposed approach.
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A comparison of clustering algorithms in article recommendation system
NASA Astrophysics Data System (ADS)
Tantanasiriwong, Supaporn
2011-12-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering
Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager. PMID:22070249
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
New Cluster Algorithms Using the Broad Histogram Relation
NASA Astrophysics Data System (ADS)
Yamaguchi, Chiaki; Kawashima, Naoki; Okabe, Yutaka
2003-11-01
We describe the Monte Carlo methods based on the cluster (graph) representation for spin models. We discuss a rigorous broad histogram relation (BHR) for the bond number; the BHR for the energy was previously derived by Oliveira. A Monte Carlo dynamics based on the number of potential moves for the bond number is presented. We also extend the BHR to the loop algorithm of the quantum simulation.
Comparison of cluster expansion fitting algorithms for interactions at surfaces
NASA Astrophysics Data System (ADS)
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
PARAMETER SEARCH FOR AN IMAGE PROCESSING FUZZY C-MEANS HAND GESTURE RECOGNITION SYSTEM
Wachs, Juan
times must be obtained for practical use [3]. Human-robot interaction using hand gestures providesPARAMETER SEARCH FOR AN IMAGE PROCESSING FUZZY C- MEANS HAND GESTURE RECOGNITION SYSTEM Juan Wachs a hand gesture recognition system using an optimized Image Processing-Fuzzy C-Means (FCM) algorithm
NIC-based Reduction Algorithms for Large-scale Clusters
Petrini, F; Moody, A T; Fernandez, J; Frachtenberg, E; Panda, D K
2004-07-30
Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale--1812 processes--NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation.
Sweeney, Timothy E.; Chen, Albert C.; Gevaert, Olivier
2015-01-01
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R. PMID:26581809
A fast clustering algorithm for data with a few labeled instances.
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS
Hoffman, Forrest M.
MECHANISTIC-BASED GENETIC ALGORITHM SEARCH ON A BEOWULF CLUSTER OF LINUX PCS Jin-Ping Gwo), Beowulf Linux cluster. ABSTRACT A simple genetic algorithm (SGA) was implemented on a cluster of Linux PCs Beowulf project was also facilitated by the freely available Linux operating system. The open source
Investigating Seed Values for the K-means Clustering Algorithm David Kronenberg
Weaver, Adam Lee
Investigating Seed Values for the K-means Clustering Algorithm David Kronenberg Abstract A major shortcoming with the K-means clustering algorithm is that it relies on random seed values to search the data set to be clustered to produce seed values that will be deterministic and will produce accurate
Novel similarity-based clustering algorithm for grouping broadcast news
NASA Astrophysics Data System (ADS)
Ibrahimov, Oktay V.; Sethi, Ishwar K.; Dimitrova, Nevenka
2002-03-01
The goal of the current paper is to introduce a novel clustering algorithm that has been designed for grouping transcribed textual documents obtained out of audio, video segments. Since audio transcripts are normally highly erroneous documents, one of the major challenges at the text processing stage is to reduce the negative impacts of errors gained at the speech recognition stage. Other difficulties come from the nature of conversational speech. In the paper we describe the main difficulties of the spoken documents and suggest an approach restricting their negative effects. In our paper we also present a clustering algorithm that groups transcripts on the base of informative closeness of documents. To carry out such partitioning we give an intuitive definition of informative field of a transcript and use it in our algorithm. To assess informative closeness of the transcripts, we apply Chi-square similarity measure, which is also described in the paper. Our experiments with Chi-square similarity measure showed its robustness and high efficacy. In particular, the performance analysis that have been carried out in regard to Chi-square and three other similarity measures such as Cosine, Dice, and Jaccard showed that Chi-square is more robust to specific features of spoken documents.
Gravitation field algorithm and its application in gene cluster
2010-01-01
Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Thermodynamic Casimir effect in films: The exchange cluster algorithm
NASA Astrophysics Data System (ADS)
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998), 10.1103/PhysRevE.57.4976]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O ,O ) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L0 of the film and study the scaling of this temperature with L0. In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data.
Koskinen, Kaisa; Auvinen, Petri; Björkroth, K Johanna; Hultman, Jenni
2015-08-01
Natural microbial communities have been studied for decades using the 16S rRNA gene as a marker. In recent years, the application of second-generation sequencing technologies has revolutionized our understanding of the structure and function of microbial communities in complex environments. Using these highly parallel techniques, a detailed description of community characteristics are constructed, and even the rare biosphere can be detected. The new approaches carry numerous advantages and lack many features that skewed the results using traditional techniques, but we are still facing serious bias, and the lack of reliable comparability of produced results. Here, we contrasted publicly available amplicon sequence data analysis algorithms by using two different data sets, one with defined clone-based structure, and one with food spoilage community with well-studied communities. We aimed to assess which software and parameters produce results that resemble the benchmark community best, how large differences can be detected between methods, and whether these differences are statistically significant. The results suggest that commonly accepted denoising and clustering methods used in different combinations produce significantly different outcome: clustering method impacts greatly on the number of operational taxonomic units (OTUs) and denoising algorithm influences more on taxonomic affiliations. The magnitude of the OTU number difference was up to 40-fold and the disparity between results seemed highly dependent on the community structure and diversity. Statistically significant differences in taxonomies between methods were seen even at phylum level. However, the application of effective denoising method seemed to even out the differences produced by clustering. PMID:25525895
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
Textural defect detect using a revised ant colony clustering algorithm
NASA Astrophysics Data System (ADS)
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
Clustering PPI Data Based on Bacteria Foraging Optimization Algorithm Xiujuan Lei*1
Buffalo, State University of New York
Clustering PPI Data Based on Bacteria Foraging Optimization Algorithm Xiujuan Lei*1 Shuang Wu2@buffalo.edu * Corresponding author Abstract--This paper proposed a novel method using Bacteria Foraging Optimization, but also automatically determined the cluster number. Keywords-bacteria foraging optimization algorithm
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Qin, Xiao
1 An Energy-Efficient Scheduling Algorithm Using Dynamic Voltage Scaling for Parallel Applications clusters. In this paper, we propose an energy-efficient scheduling algorithm (TDVAS) using the dynamic in clusters is to make use of cutting- edge energy-efficient processors. This is because processors
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Ghahramani, Zoubin
-up approximate inference method for a Dirichlet process (i.e. countably infinite) mixture model (DPM). WhileRandomized Algorithms for Fast Bayesian Hierarchical Clustering Katherine A. Heller and Zoubin,zoubin}@gatsby.ucl.ac.uk October 18, 2005 Abstract We present two new algorithms for fast Bayesian Hierarchical Clustering on large
Lowest-ID with Adaptive ID Reassignment: A Novel Mobile Ad-Hoc Networks Clustering Algorithm
Gavalas, Damianos; Konstantopoulos, Charalampos; Mamalis, Basilis
2011-01-01
Clustering is a promising approach for building hierarchies and simplifying the routing process in mobile ad-hoc network environments. The main objective of clustering is to identify suitable node representatives, i.e. cluster heads (CHs), to store routing and topology information and maximize clusters stability. Traditional clustering algorithms suggest CH election exclusively based on node IDs or location information and involve frequent broadcasting of control packets, even when network topology remains unchanged. More recent works take into account additional metrics (such as energy and mobility) and optimize initial clustering. However, in many situations (e.g. in relatively static topologies) re-clustering procedure is hardly ever invoked; hence initially elected CHs soon reach battery exhaustion. Herein, we introduce an efficient distributed clustering algorithm that uses both mobility and energy metrics to provide stable cluster formations. CHs are initially elected based on the time and cost-efficien...
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng
2014-01-01
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Help file for R function SpatialClust(), which implements the spatial clustering algorithm in
Reich, Brian J.
: Reich BJ, Bondell HD. A spatial Dirichlet process mixture model for clustering population genetic data of clusters. m2 (25) The number of mixture components in each spatial distribution. spatial (TRUE) TRUEHelp file for R function SpatialClust(), which implements the spatial clustering algorithm in
Arabshahi, Payman
A CLUSTER-MERGE ALGORITHM FOR SOLVING THE MINIMUM POWER BROADCAST PROBLEM IN LARGE SCALE WIRELESS-stage cluster-merge algorithm for computing minimum power broadcast trees. The cluster phase is a look-ahead variant of the Broad- cast Incremental Power algorithm [1] and the merge phase is a probabilistic positive
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Sequential recombination algorithm for jet clustering and background subtraction
Jeff Tseng; Hannah Evans
2013-08-14
We investigate a new sequential recombination algorithm which effectively subtracts background as it reconstructs the jet. We examine the new algorithm's behavior in light of existing algorithms, and we find that in Monte Carlo comparisons, the new algorithm's robustness against collision backgrounds is comparable to that of other jet algorithms when the latter have been augmented by further background subtraction techniques.
C-element: a new clustering algorithm to find high quality functional modules in PPI networks.
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
A density-based algorithm for discovering clusters in large spatial databases with noise
Ester, M.; Kriegel, H.P.; Sander, J.; Xu Xiaowei
1996-12-31
Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DB SCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
An algorithm for identifying clusters of functionally related genes in genomes
Yi, Gang Man
2009-05-15
common function and are not constrained by gene expression or other properties. I tested the algorithm by analyzing genomes of a representative set of species. I identified species specific variation in percentage of clustered genes as well...
Accuracy and robustness of clustering algorithms for small-size applications in bioinformatics
NASA Astrophysics Data System (ADS)
Minicozzi, Pamela; Rapallo, Fabio; Scalas, Enrico; Dondero, Francesco
2008-11-01
The performance (accuracy and robustness) of several clustering algorithms is studied for linearly dependent random variables in the presence of noise. It turns out that the error percentage quickly increases when the number of observations is less than the number of variables. This situation is common situation in experiments with DNA microarrays. Moreover, an a posteriori criterion to choose between two discordant clustering algorithm is presented.
Jeremy Kepner; Rita Kim
2000-04-21
Clusters of galaxies are the most massive objects in the Universe and mapping their location is an important astronomical problem. This paper describes an algorithm (based on statistical signal processing methods), a software architecture (based on a hybrid layered approach) and a parallelization scheme (based on a client/server model) for finding clusters of galaxies in large astronomical databases. The Adaptive Matched Filter (AMF) algorithm presented here identifies clusters by finding the peaks in a cluster likelihood map generated by convolving a galaxy survey with a filter based on a cluster model and a background model. The method has proved successful in identifying clusters in real and simulated data. The implementation is flexible and readily executed in parallel on a network of workstations.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Evaluation and Comparison of Clustering Algorithms in Analyzing ES Cell Gene Expression Data
Gengxin Chen1 Work phone: 516-367-6956 FAX: 516-367-8461 Saied A. Jaradat2 average-linkage clustering algorithm to identify groups of co-regulated yeast genes. Tavazoie et al. (1999) to identify clusters in the yeast cell cycle and human hematopoietic differentiation data sets. There are many
Chinese Text Clustering Algorithm Based k-means
NASA Astrophysics Data System (ADS)
Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.
Kent, University of
Quality Quantity and Repellent Scent Aware Artificial Bee Colony Algorithm for Clustering Unekwu, use and modification of its underlying methods [3]. Artificial bee colony (ABC) algorithm in convergence speed. Inspired by energy saving foraging behaviour of natural honey bees this paper presents
Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations
International Association for Cryptologic Research (IACR)
Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations Johann Heyszl1 are typically bound to the leakage of single executions due to cryptographic protocols or side-established algorithms. The proposed non-profiled single-execution attack is able to exploit any available single-execution
A vector reconstruction based clustering algorithm particularly for large-scale text collection.
Liu, Ming; Wu, Chong; Chen, Lei
2015-03-01
Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections. PMID:25539500
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
An improved clustering algorithm of tunnel monitoring data for cloud computing.
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Fast Algorithms for Projected Clustering Charu C. Aggarwal Cecilia Procopiuc
Hinneburg, Alexander
)@watson.ibm.com Abstract The clustering problem is well known in the database literature for its numerous. 1 Introduction The clustering problem has been discussed extensively in the database literature and classification. Various methods have been studied in considerable detail by both the statistics and database
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
Efficient cluster Monte Carlo algorithm for Ising spin glasses in more than two space dimensions
NASA Astrophysics Data System (ADS)
Ochoa, Andrew J.; Zhu, Zheng; Katzgraber, Helmut G.
2015-03-01
A cluster algorithm that speeds up slow dynamics in simulations of nonplanar Ising spin glasses away from criticality is urgently needed. In theory, the cluster algorithm proposed by Houdayer poses no advantage over local moves in systems with a percolation threshold below 50%, such as cubic lattices. However, we show that the frustration present in Ising spin glasses prevents the growth of system-spanning clusters at temperatures roughly below the characteristic energy scale J of the problem. Adding Houdayer cluster moves to simulations of Ising spin glasses for T ~ J produces a speedup that grows with the system size over conventional local moves. We show results for the nonplanar quasi-two-dimensional Chimera graph of the D-Wave Two quantum annealer, as well as conventional three-dimensional Ising spin glasses, where in both cases the addition of cluster moves speeds up thermalization visibly in the physically-interesting low temperature regime.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.
Maji, Pradipta
2011-02-01
One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In this regard, a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC), is proposed to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. The effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated on six cancer and two arthritis data sets based on the class separability index and predictive accuracy of the naive Bayes' classifier, the K-nearest neighbor rule, and the support vector machine. PMID:20542768
A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots
and words. Typically, Arabic stemming algorithms operate by "trial and error". Affixes are stripped away, these approaches have limited scope for deployment in IR. Even if substantial, their morpho-syntactic coverage
Cluster-Based Solidification and Growth Algorithm for Decagonal Quasicrystals.
Kuczera, P; Steurer, W
2015-08-21
A novel approach is used for the simulation of decagonal quasicrystal (DQC) solidification and growth. It is based on the observation that in well-ordered DQCs the atoms are largely arranged along quasiperiodically spaced planes parallel to the tenfold axis, running throughout the whole structure in five different directions. The structures themselves can be described as quasiperiodic arrangements of decagonal columnar clusters (cluster covering) that partially overlap in a systematic way. Based on these findings, we define a cluster interaction model within the mean field approximation, with effectively asymmetric interactions ranging beyond the nearest neighbors. In our Monte Carlo simulations, this leads to a long-range ordered quasiperiodic ground state. Indications of two finite-temperature unlocking phase transitions are observed, and are related to the two fundamental length scales that are characteristic for the system. PMID:26340193
The RedGOLD cluster detection algorithm and its cluster candidate catalogue for the CFHT-LS W1
NASA Astrophysics Data System (ADS)
Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik
2016-01-01
We present RedGOLD (Red-sequence Galaxy Overdensity cLuster Detector), a new optical/NIR galaxy cluster detection algorithm, and apply it to the CFHT-LS W1 field. RedGOLD searches for red-sequence galaxy overdensities while minimizing contamination from dusty star-forming galaxies. It imposes an Navarro-Frenk-White profile and calculates cluster detection significance and richness. We optimize these latter two parameters using both simulations and X-ray-detected cluster catalogues, and obtain a catalogue ˜80 per cent pure up to z ˜ 1, and ˜100 per cent (˜70 per cent) complete at z ? 0.6 (z ? 1) for galaxy clusters with M ? 1014 M? at the CFHT-LS Wide depth. In the CFHT-LS W1, we detect 11 cluster candidates per deg2 out to z ˜ 1.1. When we optimize both completeness and purity, RedGOLD obtains a cluster catalogue with higher completeness and purity than other public catalogues, obtained using CFHT-LS W1 observations, for M ? 1014 M?. We use X-ray-detected cluster samples to extend the study of the X-ray temperature-optical richness relation to a lower mass threshold, and find a mass scatter at fixed richness of ?lnM|? = 0.39 ± 0.07 and ?lnM|? = 0.30 ± 0.13 for the Gozaliasl et al. and Mehrtens et al. samples. When considering similar mass ranges as previous work, we recover a smaller scatter in mass at fixed richness. We recover 93 per cent of the redMaPPer detections, and find that its richness estimates is on average ˜40-50 per cent larger than ours at z > 0.3. RedGOLD recovers X-ray cluster spectroscopic redshifts at better than 5 per cent up to z ˜ 1, and the centres within a few tens of arcseconds.
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Toward Integrating Feature Selection Algorithms for Classification and Clustering
Yu, Lei
[50], [70], [94], image retrieval [77], [86], customer relationship management [69], intrusion and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria strategy. Each candidate subset is evaluated and compared with the previous best one according to a certain
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
Characterizing Structurally Cohesive Clusters in Networks: Theory and Algorithms
Balasubramaniam, Chitra
2015-08-05
models discussed that were based on relaxing 13 one or more of the clique-defining properties, stability is another important prop- erty that is used for characterizing clusters in various applications, particularly in fullerene chemistry. In fullerene... stable than others, has introduced independence number as a predictor of fullerene stability. However, these models do not always require that the stability of the structure is one. In particular, relaxing the stability property of a clique could help...
NASA Astrophysics Data System (ADS)
Bansod, Babankumar S.; Pandey, O. P.
2013-05-01
Within-field variability is a well-known phenomenon and its study is at the centre of precision agriculture (PA). In this paper, site-specific spatial variability (SSSV) of apparent Electrical Conductivity (ECa) and crop yield apart from pH, moisture, temperature and di-electric constant information was analyzed to construct spatial distribution maps. Principal component analysis (PCA) and fuzzy c-means (FCM) clustering algorithm were then performed to delineate management zones (MZs). Various performance indices such as Normalized Classification Entropy (NCE) and Fuzzy Performance Index (FPI) were calculated to determine the clustering performance. The geo-referenced sensor data was analyzed for within-field classification. Results revealed that the variables could be aggregated into MZs that characterize spatial variability in soil chemical properties and crop productivity. The resulting classified MZs showed favorable agreement between ECa and crop yield variability pattern. This enables reduction in number of soil analysis needed to create application maps for certain cultivation operations.
A cluster finding algorithm based on the multiband identification of red sequence galaxies
NASA Astrophysics Data System (ADS)
Oguri, Masamune
2014-10-01
We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ˜11 960 deg2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71 743 clusters in the redshift range 0.1 < z < 0.6 with richness after correcting for the incompleteness of the richness estimate greater than 20. We cross-match the cluster catalogue with external cluster catalogues to find that our photometric cluster redshift estimates are accurate with low bias and scatter, and that the corrected richness correlates well with X-ray luminosities and temperatures. We use the publicly available Canada-France-Hawaii Telescope Lensing Survey shear catalogue to calibrate the mass-richness relation from stacked weak lensing analysis. Stacked weak lensing signals are detected significantly for eight subsamples of the SDSS clusters divided by redshift and richness bins, which are then compared with model predictions including miscentring effects to constrain mean halo masses of individual bins. We find the richness correlates well with the halo mass, such that the corrected richness limit of 20 corresponds to the cluster virial mass limit of about 1 × 1014 h-1 M? for the SDSS DR8 cluster sample.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
AN X-RAY SPECTRAL CLASSIFICATION ALGORITHM WITH APPLICATION TO YOUNG STELLAR CLUSTERS
Micela, Giusi
AN X-RAY SPECTRAL CLASSIFICATION ALGORITHM WITH APPLICATION TO YOUNG STELLAR CLUSTERS S. M; accepted 2006 December 30 ABSTRACT A large volume of low signal-to-noise, multidimensional data is available from the CCD imaging spectrometers aboard the Chandra X-Ray Observatory and the X-Ray Multimirror
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks
Yao, Danfeng "Daphne'
carry out a series of experiments to evaluate the graph utilities of the anonymized social net- worksThe Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Knowledge discovery on social network data can uncover latent social trends and produce valuable findings
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms
Smyth, Padhraic
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms This paper investigates the problem of find- ing a K-state first-order Markov chain that approximates an M-state to want to estimate a Markov chain with a very large number of states M. For example, in modeling of Web
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms
Smyth, Padhraic
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms This paper investigates the problem of #12;nd- ing a K-state #12;rst-order Markov chain that approximates an M-state #12;rst-order Markov chain, where K is typically much smaller than M . A variety of greedy
A Memetic Co-Clustering Algorithm for Gene Expression Profiles and Biological Annotation
Zell, Andreas
A Memetic Co-Clustering Algorithm for Gene Expression Profiles and Biological Annotation Nora Speer by computers. The use of this additional annotation information promises to improve biological data analysis this method at least incorporates additional biological knowledge, it still is a sequential data analysis
Color Image Segmentation Using a Spatial K-Means Clustering Algorithm
Whelan, Paul F.
Color Image Segmentation Using a Spatial K-Means Clustering Algorithm Dana Elena Ilea and Paul F danailea@eeng.dcu.ie Abstract This paper details the implementation of a new adaptive technique for color with respect to texture and color since no local constraints are applied to impose spatial continuity
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Unsupervised unstained cell detection by SIFT keypoint clustering and self-labeling algorithm.
Muallal, Firas; Schöll, Simon; Sommerfeldt, Björn; Maier, Andreas; Steidl, Stefan; Buchholz, Rainer; Hornegger, Joachim
2014-01-01
We propose a novel unstained cell detection algorithm based on unsupervised learning. The algorithm utilizes the scale invariant feature transform (SIFT), a self-labeling algorithm, and two clustering steps in order to achieve high performance in terms of time and detection accuracy. Unstained cell imaging is dominated by phase contrast and bright field microscopy. Therefore, the algorithm was assessed on images acquired using these two modalities. Five cell lines having in total 37 images and 7250 cells were considered for the evaluation: CHO, L929, Sf21, HeLa, and Bovine cells. The obtained F-measures were between 85.1 and 89.5. Compared to the state-of-the-art, the algorithm achieves very close F-measure to the supervised approaches in much less time. PMID:25320822
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
A Priori Data-Driven Multi-Clustered Reservoir Generation Algorithm for Echo State Network
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
Yang, Jing; Chen, Limin; Zhang, Jianpei
2015-01-01
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification. PMID:26090857
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
KD-tree based clustering algorithm for fast face recognition on large-scale data
NASA Astrophysics Data System (ADS)
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng
2015-10-01
A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity of correlation threshold selection.
Rondina, Gustavo G; Da Silva, Juarez L F
2013-09-23
Suggestions for improving the Basin-Hopping Monte Carlo (BHMC) algorithm for unbiased global optimization of clusters and nanoparticles are presented. The traditional basin-hopping exploration scheme with Monte Carlo sampling is improved by bringing together novel strategies and techniques employed in different global optimization methods, however, with the care of keeping the underlying algorithm of BHMC unchanged. The improvements include a total of eleven local and nonlocal trial operators tailored for clusters and nanoparticles that allow an efficient exploration of the potential energy surface, two different strategies (static and dynamic) of operator selection, and a filter operator to handle unphysical solutions. In order to assess the efficiency of our strategies, we applied our implementation to several classes of systems, including Lennard-Jones and Sutton-Chen clusters with up to 147 and 148 atoms, respectively, a set of Lennard-Jones nanoparticles with sizes ranging from 200 to 1500 atoms, binary Lennard-Jones clusters with up to 100 atoms, (AgPd)55 alloy clusters described by the Sutton-Chen potential, and aluminum clusters with up to 30 atoms described within the density functional theory framework. Using unbiased global search our implementation was able to reproduce successfully the great majority of all published results for the systems considered and in many cases with more efficiency than the standard BHMC. We were also able to locate previously unknown global minimum structures for some of the systems considered. This revised BHMC method is a valuable tool for aiding theoretical investigations leading to a better understanding of atomic structures of clusters and nanoparticles. PMID:23957311
Aickelin, Uwe
Identifying Stable Breast Cancer Subgroups Using Semi-supervised Fuzzy c-means on a Reduced Panel clinically-useful and stable breast cancer subgroups using a reduced panel of biomarkers. First, we on clustering of breast cancer data. The stability of the subgroups found are assessed based on comparison
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
Yue Guo; Liang-Liang Wang; Xu-Dong Ju; Ling-Hui Wu; Qing-Lei Xiu; Hai-Xia Wang; Ming-Yi Dong; Jing-Ran Hu; Wei-Dong Li; Wei-Guo Li; Huai-Min Liu; Qun Ou-Yang; Xiao-Yan Shen; Ye Yuan; Yao Zhang
2015-04-10
Considering the aging effects of existing Inner Drift Chamber (IDC) of BES\\uppercase\\expandafter{\\romannumeral3}, a GEM based inner tracker is proposed to be designed and constructed as an upgrade candidate for IDC. This paper introduces a full simulation package of CGEM-IT with a simplified digitization model, describes the development of the softwares for cluster reconstruction and track fitting algorithm based on Kalman filter method for CGEM-IT. Preliminary results from the reconstruction algorithms are obtained using a Monte Carlo sample of single muon events in CGEM-IT.
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
Utilizing unsupervised learning to cluster data in the Bayesian data reduction algorithm
NASA Astrophysics Data System (ADS)
Lynch, Robert S., Jr.; Willett, Peter K.
2005-03-01
In this paper, unsupervised learning is utilized to illustrate the ability of the Bayesian Data Reduction Algorithm (BDRA) to cluster unlabeled training data. The BDRA is based on the assumption that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach (similar to a backward sequential feature search) for reducing irrelevant features from the training data of each class. Notice that reducing irrelevant features is synonymous here with selecting those features that provide best classification performance; the metric for making data reducing decisions is an analytic formula for the probability of error conditioned on the training data. The contribution of this work is to demonstrate how clustering performance varies depending on the method utilized for unsupervised training. To illustrate performance, results are demonstrated using simulated data. In general, the results of this work have implications for finding clusters in data mining applications.
Crowded Cluster Cores: Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J; Rykoff, Eli; Song, Jeeseon
2014-01-01
Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. We present new software called the Gradient And INterpolation based deblender (GAIN) as a secondary deblender to improve deblending the images of cluster cores. This software relies on using image intensity gradient and using an image interpolation technique usually used to correct flawed terrestrial digital images. We test this software on Dark Energy Survey coadd images. GAIN helps extracting unbiased photometry measurement for blended sources. It also helps improving detection completeness while introducing only a modest amount of spurious detections. For example, when applied to deep images simulated with high level o...
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Wang, Jianxin; Li, Min; Chen, Jianer; Pan, Yi
2011-01-01
As advances in the technologies of predicting protein interactions, huge data sets portrayed as networks have been available. Identification of functional modules from such networks is crucial for understanding principles of cellular organization and functions. However, protein interaction data produced by high-throughput experiments are generally associated with high false positives, which makes it difficult to identify functional modules accurately. In this paper, we propose a fast hierarchical clustering algorithm HC-PIN based on the local metric of edge clustering value which can be used both in the unweighted network and in the weighted network. The proposed algorithm HC-PIN is applied to the yeast protein interaction network, and the identified modules are validated by all the three types of Gene Ontology (GO) Terms: Biological Process, Molecular Function, and Cellular Component. The experimental results show that HC-PIN is not only robust to false positives, but also can discover the functional modules with low density. The identified modules are statistically significant in terms of three types of GO annotations. Moreover, HC-PIN can uncover the hierarchical organization of functional modules with the variation of its parameter's value, which is approximatively corresponding to the hierarchical structure of GO annotations. Compared to other previous competing algorithms, our algorithm HC-PIN is faster and more accurate. PMID:20733244
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, ?d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, ?d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, ?d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, ?d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Effects of algorithm for diagnosis of active labour: cluster randomised trial
2008-01-01
Objective To compare the effectiveness of an algorithm for diagnosis of active labour in primiparous women with standard care in terms of maternal and neonatal outcomes. Design Cluster randomised trial. Setting Maternity units in Scotland with at least 800 annual births. Participants 4503 women giving birth for the first time, in 14 maternity units. Seven experimental clusters collected data from a baseline sample of 1029 women and a post-implementation sample of 896 women. The seven control clusters had a baseline sample of 1291 women and a post-implementation sample of 1287 women. Intervention Use of an algorithm by midwives to assist their diagnosis of active labour, compared with standard care. Main outcomes Primary outcome: use of oxytocin for augmentation of labour. Secondary outcomes: medical interventions in labour, admission management, and birth outcome. Results No significant difference was found between groups in percentage use of oxytocin for augmentation of labour (experimental minus control, difference=0.3, 95% confidence interval ?9.2 to 9.8; P=0.9) or in the use of medical interventions in labour. Women in the algorithm group were more likely to be discharged from the labour suite after their first labour assessment (difference=?19.2, ?29.9 to ?8.6; P=0.002) and to have more pre-labour admissions (0.29, 0.04 to 0.55; P=0.03). Conclusions Use of an algorithm to assist midwives with the diagnosis of active labour in primiparous women did not result in a reduction in oxytocin use or in medical intervention in spontaneous labour. Significantly more women in the experimental group were discharged home after their first labour ward assessment. Trial registration Current Controlled Trials ISRCTN00522952. PMID:19064606
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1993-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.
Crowded Cluster Cores: An Algorithm for Deblending in Dark Energy Survey Images
NASA Astrophysics Data System (ADS)
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-12-01
Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. This paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
2014-01-01
Background Accurate genotype calling is a pre-requisite of a successful Genome-Wide Association Study (GWAS). Although most genotyping algorithms can achieve an accuracy rate greater than 99% for genotyping DNA samples without copy number alterations (CNAs), almost all of these algorithms are not designed for genotyping tumor samples that are known to have large regions of CNAs. Results This study aims to develop a statistical method that can accurately genotype tumor samples with CNAs. The proposed method adds a Bayesian layer to a cluster regression model and is termed a Bayesian Cluster Regression-based genotyping algorithm (BCRgt). We demonstrate that high concordance rates with HapMap calls can be achieved without using reference/training samples, when CNAs do not exist. By adding a training step, we have obtained higher genotyping concordance rates, without requiring large sample sizes. When CNAs exist in the samples, accuracy can be dramatically improved in regions with DNA copy loss and slightly improved in regions with copy number gain, comparing with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM). Conclusions In conclusion, we have demonstrated that BCRgt can provide accurate genotyping calls for tumor samples with CNAs. PMID:24629125
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
A Comparative Study of Self-organizing Clustering Algorithms Dignet and ART2.
Thomopoulos, Stelios C.A.; Wann, Chin Der
1997-06-01
A comparative study of two self-organizing clustering neural network algorithms, Dignet and ART2, has been conducted. The differences in architecture and learning procedures between the two models are compared. Comparative computer simulations on data clustering and signal detection problems with Gaussian noise were used for investigating the performance of Dignet and "fast learning" ART2. The study shows that Dignet, with a simple architecture and straightforward dynamics, is more flexible with the choice of different metrics for the measure of similarity. The system parameters in Dignet can be analytically determined from a self-adjusting process; moreover, the initial threshold value used in Dignet is directly determined from a lower-bound of the desirable operational signal-to-noise ratio. Simulations show that Dignet generally exhibits faster learning and better clustering performance on statistical pattern recognition problems. A simplified ART2 model (SART2) is derived by adopting the structural concepts from Dignet. SART2 exhibits faster learning and eliminates a "false conviction" problem that exists in the "fast learning" ART2. The comparative study is benchmarked against statistical data clustering and signal detection problems. Copyright 1997 Elsevier Science Ltd. PMID:12662867
NASA Astrophysics Data System (ADS)
Schuetter, Jared Michael
Excavating cairns in southern Arabia is a way for anthropologists to understand which factors led ancient settlers to transition from a pastoral lifestyle and tribal narrative to the formation of states that exist today. Locating these monuments has traditionally been done in the field, relying on eyewitness reports and costly searches through the arid landscape. In this thesis, an algorithm for automatically detecting cairns in satellite imagery is presented. The algorithm uses a set of filters in a window based approach to eliminate background pixels and other objects that do not look like cairns. The resulting set of detected objects constitutes fewer than 0.001% of the pixels in the satellite image, and contains the objects that look the most like cairns in imagery. When a training set of cairns is available, a further reduction of this set of objects can take place, along with a likelihood-based ranking system. To aid in cairn detection, the satellite image is also clustered to determine land-form classes that tend to be consistent with the presence of cairns. Due to the large number of pixels in the image, a subsample spectral clustering algorithm called "Multiple Sample Data Spectroscopic clustering" is used. This multiple sample clustering procedure is motivated by perturbation studies on single sample spectral algorithms. The studies, presented in this thesis, show that sampling variability in the single sample approach can cause an unsatisfactory level of instability in clustering results. The multiple sample data spectroscopic clustering algorithm is intended to stabilize this perturbation by combining information from different samples. While sampling variability is still present, the use of multiple samples mitigates its effect on cluster results. Finally, a step-through of the cairn detection algorithm and satellite image clustering are given for an image in the Hadramawt region of Yemen. The top ranked detected objects are presented, and a discussion of parameter selection and future work follows.
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm
NASA Astrophysics Data System (ADS)
Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel
2014-05-01
Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.
A computational algorithm for functional clustering of proteome dynamics during development.
Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling
2014-06-01
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.
Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C
2014-06-01
KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Wu, Jia-Rui; Guo, Wei-Xian; Zhang, Xiao-Meng; Yang, Bing; Zhang, Bing
2014-02-01
Based on the data mining methods of association rules and clustering algorithm, the 188 prescriptions for cough that built by Yan Zhenghua were collected and analyzed to get the frequency of drug usage and the relationship between drugs. From which we could conclude the experiences of Yan Zhenghua for the treatment of cough. The results of the analysis were that 20 core combinations were dig out, such as Bambusae Caulis in Taenias-Almond-Sactmarsh Aster. And there were 10 new prescriptions were found out, such as Sactmarsh Aster-Scutellariae Radix-Album Viscum-Bambusae Caulis in Taenian-Eriobotryae Folium. The results of the analysis were proved that Yan Zhenghua was good at curing cough by using the traditional Chinese medicine that can dispel wind and heat from the body, and remove heat from the lung to relieve cough. PMID:25204134
NASA Astrophysics Data System (ADS)
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi
2014-09-01
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. PMID:24596179
Ashton, Douglas J; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B
2010-11-21
Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical cluster algorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs. PMID:21090849
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
NASA Astrophysics Data System (ADS)
Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin
2011-04-01
To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.
NASA Astrophysics Data System (ADS)
Chang, Seongmin; Baek, Sungmin; Kim, Ki-Ook; Cho, Maenghyo
2015-06-01
A system identification method has been proposed to validate finite element models of complex structures using measured modal data. Finite element method is used for the system identification as well as the structural analysis. In perturbation methods, the perturbed system is expressed as a combination of the baseline structure and the related perturbations. The changes in dynamic responses are applied to determine the structural modifications so that the equilibrium may be satisfied in the perturbed system. In practical applications, the dynamic measurements are carried out on a limited number of accessible nodes and associated degrees of freedom. The equilibrium equation is, in principle, expressed in terms of the measured (master, primary) and unmeasured (slave, secondary) degrees of freedom. Only the specified degrees of freedom are included in the equation formulation for identification and the unspecified degrees of freedom are eliminated through the iterative improved reduction scheme. A large number of system parameters are included as the unknown variables in the system identification of large-scaled structures. The identification problem with large number of system parameters requires a large amount of computation time and resources. In the present study, a hierarchical clustering algorithm is applied to reduce the number of system parameters effectively. Numerical examples demonstrate that the proposed method greatly improves the accuracy and efficiency in the inverse problem of identification.
Stochastic modeling for trajectories drift in the ocean: Application of Density Clustering Algorithm
E. Y. Shchekinova; Y. Kumkar
2015-05-18
The aim of this study is to address the effects of wind-induced drift on a floating sea objects using high--resolution ocean forecast data and atmospheric data. Two applications of stochastic Leeway model for prediction of trajectories drift in the Mediterranean sea are presented: long-term simulation of sea drifters in the western Adriatic sea (21.06.2009-23.06.2009) and numerical reconstruction of the Elba accident (21.06.2009-23.06.2009). Long-term simulations in the western Adriatic sea are performed using wind data from the European Center for Medium-Range Weather Forecast (ECMWF) and currents from the Adriatic Forecasting System (AFS). An algorithm of spatial clustering is proposed to identify the most probable search areas with a high density of drifters. The results are compared for different simulation scenarios using different categories of drifters and forcing fields. The reconstruction of sea object drift near to the Elba Island is performed using surface currents from the Mediterranean Forecasting System (MFS) and atmospheric forcing fields from the ECMWF. The results showed that draft-limited to an upper surface drifters more closely reproduced target trajectory during the accident.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
Stochastic modeling for trajectories drift in the ocean: Application of Density Clustering Algorithm
Shchekinova, E Y
2015-01-01
The aim of this study is to address the effects of wind-induced drift on a floating sea objects using high--resolution ocean forecast data and atmospheric data. Two applications of stochastic Leeway model for prediction of trajectories drift in the Mediterranean sea are presented: long-term simulation of sea drifters in the western Adriatic sea (21.06.2009-23.06.2009) and numerical reconstruction of the Elba accident (21.06.2009-23.06.2009). Long-term simulations in the western Adriatic sea are performed using wind data from the European Center for Medium-Range Weather Forecast (ECMWF) and currents from the Adriatic Forecasting System (AFS). An algorithm of spatial clustering is proposed to identify the most probable search areas with a high density of drifters. The results are compared for different simulation scenarios using different categories of drifters and forcing fields. The reconstruction of sea object drift near to the Elba Island is performed using surface currents from the Mediterranean Forecastin...
Delineation and quantitation of brain lesions by fuzzy clustering in positron emission tomography.
Boudraa, A E; Champier, J; Cinotti, L; Bordet, J C; Lavenne, F; Mallet, J J
1996-01-01
In this study, we investigate the application of the fuzzy clustering to the anatomical localization and quantitation of brain lesions in Positron Emission Tomography (PET) images. The method is based on the Fuzzy C-Means (FCM) algorithm. The algorithm segments the PET image data points into a given number of clusters. Each cluster is an homogeneous region of the brain (e.g. tumor). A feature vector is assigned to a cluster which has the highest membership degree. Having the label affected by the FCM algorithm to a cluster, one may easily compute the corresponding spatial localization, area and perimeter. Studies concerning the evolution of a tumor after different treatments in two patients are presented. PMID:8891420
Campana, R; Massaro, E; Tinebra, F; Tosti, G
2013-01-01
The minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to gamma-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular "tree" connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, can be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitude M of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that sqrt(M) is strongly correlated with oth...
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. PMID:20144735
Julia Fornleitner; Gerhard Kahl
2007-09-03
Introducing genetic algorithms as a reliable and efficient tool to find ordered equilibrium structures, we predict minimum energy configurations of the square shoulder system for different values of corona width $\\lambda$. Varying systematically the pressure for different values of $\\lambda$ we obtain complete sequences of minimum energy configurations which provide a deeper understanding of the system's strategies to arrange particles in an energetically optimized fashion, leading to the competing self-assembly scenarios of cluster-formation vs. lane-formation.
Sumithra, Subramaniam; Victoire, T. Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
Sumithra, Subramaniam; Victoire, T Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
Li, Weizhong [San Diego Supercomputer Center
2013-01-22
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets Harsha S Nagesh
in statistics, machine learning [12], pattern recognition and image processing [8]. Intuitively, clustering of adaptive grids and present a truly un-supervised clustering al- gorithm requiring no user inputs. Our current methods with much better quality of clustering. Performance results on both real and synthetic
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
NASA Astrophysics Data System (ADS)
Davis, Jack B. A.; Shayeghi, Armin; Horswell, Sarah L.; Johnston, Roy L.
2015-08-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters.A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. Electronic supplementary information (ESI) available. See DOI: 10.1039/C5NR03774C
Solution of facility location problem in Turkey by using fuzzy C-means method
NASA Astrophysics Data System (ADS)
Kocakaya, Mustafa Nabi; Türkak?n, Osman Hürol
2013-10-01
Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.
KtJet: A C++ implementation of the Kt clustering algorithm
J. M. Butterworth; J. P. Couchman; B. E. Cox; B. M. Waugh
2002-10-01
A C++ implementation of the Kt jet algorithm for high energy particle collisions is presented. The time performance of this implementation is comparable to the widely used Fortran implementation. Identical algorithmic functionality is provided, with a clean and intuitive user interface and additional recombination schemes. A short description of the algorithm and examples of its use are given.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test ? coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2013-01-01
We present multiple GPU computing with the common unified device architecture (CUDA) for the Swendsen-Wang multi-cluster algorithm of two-dimensional (2D) q-state Potts model. Extending our algorithm for single GPU computing [Y. Komura, Y. Okabe, GPU-based Swendsen-Wang multi-cluster algorithm for the simulation of two-dimensional classical spin systems, Comput. Phys. Comm. 183 (2012) 1155-1161], we realize the GPU computation of the Swendsen-Wang multi-cluster algorithm for multiple GPUs. We implement our code on the large-scale open science supercomputer TSUBAME 2.0, and test the performance and the scalability of the simulation of the 2D Potts model. The performance on Tesla M2050 using 256 GPUs is obtained as 37.3 spin flips per a nano second for the q=2 Potts model (Ising model) at the critical temperature with the linear system size L=65536.
Silva, M X; Galvão, B R L; Belchior, J C
2014-09-01
The potential energy hypersurface associated with sodium-potassium alloy clusters is explored via an enhanced genetic algorithm, where two different operators are added to the standard evolutionary procedure. Based on the recent result that the empirical Gupta many-body potential yields reasonable results for clusters with more than seven atoms, we have employed this function in the evaluation of the energies. Agglomerates from seven to the well-established 55-atom structure are studied, and their second-order energy difference and excess energies are calculated. It is found that the most stable alloys (compared to the homonuclear counterparts) are found with the proportion of sodium atoms in the range of 30 to 40%. The experimental propensity of core-shell segregation is successfully predicted by the current approach. PMID:25208555
Chen, Wei-Chen; Ostrouchov, George; Pugmire, Dave; Prabhat,; Wehner, Michael
2013-01-01
We develop a parallel EM algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate data set. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data (SPMD) programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger data sets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.
Yilmaz, Nihat; Inan, Onur; Uzer, Mustafa Serter
2014-05-01
The most important factors that prevent pattern recognition from functioning rapidly and effectively are the noisy and inconsistent data in databases. This article presents a new data preparation method based on clustering algorithms for diagnosis of heart and diabetes diseases. In this method, a new modified K-means Algorithm is used for clustering based data preparation system for the elimination of noisy and inconsistent data and Support Vector Machines is used for classification. This newly developed approach was tested in the diagnosis of heart diseases and diabetes, which are prevalent within society and figure among the leading causes of death. The data sets used in the diagnosis of these diseases are the Statlog (Heart), the SPECT images and the Pima Indians Diabetes data sets obtained from the UCI database. The proposed system achieved 97.87 %, 98.18 %, 96.71 % classification success rates from these data sets. Classification accuracies for these data sets were obtained through using 10-fold cross-validation method. According to the results, the proposed method of performance is highly successful compared to other results attained, and seems very promising for pattern recognition applications. PMID:24737307
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
NASA Astrophysics Data System (ADS)
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-01
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
Scalable Algorithms for Complete Exchange on MultiCluster Networks Alfredo Goldman
Goldman, Alfredo
of exchanging messages of different sizes, but they did not allow preemption on mes sages sending, once and receive at most one mes sage simultaneously. This limitation is well known in the literature as one phase t, processor p i sends its mes sage to processor p i+t mod n . HL Hypercubelike Algorithm
VIDEO OBJECT SEGMENTATION AND TRACKING USING PROBABILISTIC FUZZY C-MEANS
Zhang, Xiao-Ping
VIDEO OBJECT SEGMENTATION AND TRACKING USING PROBABILISTIC FUZZY C-MEANS Jian Zhou, Xiao-Ping Zhang, Canada, M5B 2K3 E-mail: {jzhou, xzhang}@ee.ryerson.ca ABSTRACT Automatic video object segmentation to associate the segmented regions to form video objects. Temporal tracking is achieved by projecting
Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda
Takiguchi, Tetsuya
Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda , Tetsuya: takigu@kobe-u.ac.jp Email: ariki@kobe-u.ac.jp Abstract In this paper, we propose a flower image retrieval on the flower structure. Flowers may be classified into three types according to their structure: Rounded
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
A new time dependent density functional algorithm for large systems and plasmons in metal clusters.
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-14
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H2 to [Au147](-)). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented. PMID:26178089
Jiang, Joe-Air; Chen, Chia-Pang; Chuang, Cheng-Long; Lin, Tzu-Shiang; Tseng, Chwan-Lu; Yang, En-Cheng; Wang, Yung-Chung
2009-01-01
Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0.92 for FGT% and r = 0.93 for |FGT|, and the automated segmentation is not statistically significantly different (p = 0.46 for FGT% and p = 0.55 for |FGT|). The bilateral correlation between left breasts and right breasts for the FGT% is 0.94, 0.92, and 0.95 for reader 1, reader 2, and the FCM-Atlas, respectively; likewise, for the |FGT|, it is 0.92, 0.92, and 0.93, respectively. For the spatial segmentation agreement, the automated algorithm achieves a DSC of 0.69 ± 0.1 when compared to reader 1 and 0.61 ± 0.1 for reader 2, respectively, while the DSC between the two readers’ manual segmentation is 0.67 ± 0.15. Additional robustness analysis shows that the segmentation performance of the authors' method is stable both with respect to selecting different cases and to varying the number of cases needed to construct the prior probability atlas. The authors' results also show that the proposed FCM-Atlas method outperforms the commonly used two-cluster FCM-alone method. The authors' method runs at ?5 min for each 3D bilateral MR scan (56 slices) for computing the FGT% and |FGT|, compared to ?55 min needed for manual segmentation for the same purpose. Conclusions: The authors' method achieves robust segmentation and can serve as an efficient tool for processing large clinical datasets for quantifying the fibroglandular tissue content in breast MRI. It holds a great potential to support clinical applications in the future including breast cancer risk assessment.
A binned clustering algorithm to detect high-Z material using cosmic muons
NASA Astrophysics Data System (ADS)
Thomay, C.; Velthuis, J. J.; Baesso, P.; Cussans, D.; Morris, P. A. W.; Steer, C.; Burns, J.; Quillin, S.; Stapleton, M.
2013-10-01
We present a novel approach to the detection of special nuclear material using cosmic rays. Muon Scattering Tomography (MST) is a method for using cosmic muons to scan cargo containers and vehicles for special nuclear material. Cosmic muons are abundant, highly penetrating, not harmful for organic tissue, cannot be screened against, and can easily be detected, which makes them highly suited to the use of cargo scanning. Muons undergo multiple Coulomb scattering when passing through material, and the amount of scattering is roughly proportional to the square of the atomic number Z of the material. By reconstructing incoming and outgoing tracks, we can obtain variables to identify high-Z material. In a real life application, this has to happen on a timescale of 1 min and thus with small numbers of muons. We have built a detector system using resistive plate chambers (RPCs): 12 layers of RPCs allow for the readout of 6 x and 6 y positions, by which we can reconstruct incoming and outgoing tracks. In this work we detail the performance of an algorithm by which we separate high-Z targets from low-Z background, both for real data from our prototype setup and for MC simulation of a cargo container-sized setup. (c) British Crown Owned Copyright 2013/AWE
Thimmaiah, Tim; Voje, William E; Carothers, James M
2015-01-01
With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts. PMID:25487092
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Houseman, E Andres; Christensen, Brock C; Yeh, Ru-Fang; Marsit, Carmen J; Karagas, Margaret R; Wrensch, Margaret; Nelson, Heather H; Wiemels, Joseph; Zheng, Shichun; Wiencke, John K; Kelsey, Karl T
2008-01-01
Background Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner. Results We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age. Conclusion Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data. PMID:18782434
NASA Astrophysics Data System (ADS)
Winter, André; Rieger, Heiko; Vojta, Matthias; Bulla, Ralf
2009-01-01
A continuous time cluster algorithm for two-level systems coupled to a dissipative bosonic bath is presented and applied to the sub-Ohmic spin-boson model. When the power s of the spectral function J(?)??s is smaller than 1/2, the critical exponents are found to be classical, mean-field like. Potential sources for the discrepancy with recent renormalization group predictions are traced back to the effect of a dangerously irrelevant variable.
NASA Technical Reports Server (NTRS)
Werth, L. F. (principal investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Athens, University of
Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c compression technique, when applied to a set of medical images. 1. Introduction Image compression plays as that they could be used in a multitude of purposes. For instance, it is necessary that medical images
Single Pass Fuzzy C Means Prodip Hore, Lawrence O. Hall, and Dmitry B. Goldgof
Hall, Lawrence O.
address the crisp case of clustering, which cannot be easily generalized to the fuzzy case. In this paper for the crisp case, not as much work has been done for the fuzzy case. As pointed out in [10], the crisp case
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
On the analysis of BIS stage epochs via fuzzy clustering.
Nasibov, Efendi; Ozgören, Murat; Ulutagay, Gözde; Oniz, Adile; Kocaaslan, Sibel
2010-06-01
Among various types of clustering methods, partition-based methods such as k-means and FCM are widely used in the analysis of such data. However, when duration between stimuli is different, such methods are not able to provide satisfactory results because they find equal size clusters according to the fundamental running principle of these methods. In such cases, neighborhood-based clustering methods can give more satisfactory results because measurement series are separated from one another according to dramatic breaking points. In recent years, bispectral index (BIS) monitoring, which is used for monitoring the level of anesthesia, has been used in sleep studies. Sleep stages are classically scored according to the Rechtschaffen and Kales (R&K) scoring system. BIS has been shown to have a strong correlation with the R&K scoring system. In this study, fuzzy neighborhood/density-based spatial clustering of applications with noise (FN-DBSCAN) that combines speed of the DBSCAN algorithm and robustness of the NRFJP algorithm is applied to BIS measurement series. As a result of experiments, we can conclude that, by using BIS data, the FN-DBSCAN method estimates sleep stages better than the fuzzy c-means method. PMID:20156029
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
NASA Astrophysics Data System (ADS)
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r4), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
Segmentation of Spin-Echo MRI brain images: a comparison study of Crisp and Fuzzy algorithms
Chung, Maranatha
1993-01-01
This thesis presents a scheme for segmenting Spin-Echo MRI brain images based on Fuzzy C-Mean (FCM) clustering techniques. This scheme consists of feature extraction, feature conditioning or evaluation, and thresholded FCM clustering. Feature...
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Holt, Richard C.
support for large software projects. 1 Introduction A well documented architecture can improve the quality and maintainability of a software system. However, many existing systems often do not have their architecture docu in this task [13, 24]. Software clustering refers to the decomposition of a soft- ware system into meaningful
Krejci, Adam; Hupp, Ted R.; Lexa, Matej; Vojtesek, Borivoj; Muller, Petr
2016-01-01
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. Contact: muller@mou.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342231
NASA Astrophysics Data System (ADS)
Huang, Xiaoming; Sai, Linwei; Jiang, Xue; Zhao, Jijun
2013-02-01
Employing genetic algorithm incorporated with density functional theory calculations we determined the lowest-energy structures of cationic Na n + clusters ( n = 9, 15, 21, 26, 31, 36, 41, 50 and 59). We revealed a transition of growth pattern from "polyicosahedral" sequence to the Mackay icosahedral motif at around n = 40. Based on the ground-state structures the size dependent electronic properties of Na n + clusters including the binding energies, HOMO-LUMO gaps, electron density of states and photoabsorption spectra were discussed. As cluster size increases, the HOMO-LUMO gap of Na n + cluster gradually reduces and converges to metallic behavior of bulk crystal rapidly. The photoabsorption spectra of Na n + clusters from our calculations agree with experimental data rather well, confirming the reliability of our theoretical approaches.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences clustering leads to a trade-off between the quality of GSD separation between patch types and the spatial coherence of patches. Methods incorporating spatial information during the clustering process tended to produce a finite number of types of patches. As methods improve for collecting high-resolution grain size data, the approaches described here can be scaled up to field studies to better characterize the grain size heterogeneity of river beds.
Vilalta, Ricardo
THEMATIC MAPS OF MARTIAN TOPOGRAPHY GENERATED BY A CLUSTERING ALGORITHM. R. Vilalta, Dept digital topography. These maps show spatial distribution of topo- graphical features and are generated for the observable topography. Traditionally, the descriptive method, applied to imagery data, has been used to study
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
Distance Measures Hierarchical Clustering
Ullman, Jeffrey D.
1 Clustering Distance Measures Hierarchical Clustering k -Means Algorithms #12;2 The Problem Measures Each clustering problem is based on some kind of "distance" between points. Two major classes of distance measure: 1. Euclidean 2. Non-Euclidean #12;12 Euclidean Vs. Non-Euclidean A Euclidean space has
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…
Bicego, Manuele
Watershed-Based Unsupervised Clustering Manuele Bicego, Marco Cristani, Andrea Fusiello purpose clustering algorithm is presented, based on the watershed algorithm. The proposed approach defines. The clustering is then performed using the well-known watershed algorithm, paying particular attention
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model developed in the current work. This model uses fundamental measurable microscopic quantities such as genome length rather than macroscopic radiobiological quantities such as alpha/beta ratios. This means that the model can be theoretically used under a wide range of conditions with a single set of input parameters once calibrated for a given cell line.
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K.A. Hawick, A. Leist, and D. P. Playne, Parallel Computing 36 (2010) 655-678 [2] O. Kalentev, A. Rai, S. Kemnitzb, and R. Schneider, J. Parallel Distrib. Comput. 71 (2011) 615-620
A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.
Wang, Dianhui; Tapan, Sarwar
2013-10-01
It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets. PMID:24808603
Haplotyping Problem, A Clustering Approach
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-06
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings. PMID:24802018
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Dynamic clustering via asymptotics of the dependent Dirichlet process mixture
Campbell, Trevor David
2013-01-01
This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via ...
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering.
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
A hybrid skull-stripping algorithm based on adaptive balloon snake models
NASA Astrophysics Data System (ADS)
Liu, Hung-Ting; Sheu, Tony W. H.; Chang, Herng-Hua
2013-02-01
Skull-stripping is one of the most important preprocessing steps in neuroimage analysis. We proposed a hybrid algorithm based on an adaptive balloon snake model to handle this challenging task. The proposed framework consists of two stages: first, the fuzzy possibilistic c-means (FPCM) is used for voxel clustering, which provides a labeled image for the snake contour initialization. In the second stage, the contour is initialized outside the brain surface based on the FPCM result and evolves under the guidance of the balloon snake model, which drives the contour with an adaptive inward normal force to capture the boundary of the brain. The similarity indices indicate that our method outperformed the BSE and BET methods in skull-stripping the MR image volumes in the IBSR data set. Experimental results show the effectiveness of this new scheme and potential applications in a wide variety of skull-stripping applications.
Static and Dynamic Information Organization with Star Clusters
Aslam, Javed
Static and Dynamic Information Organization with Star Clusters Javed Aslam Katya Pelekhov Daniela on TREC data. We introduce the o#Âline and onÂline star clustering algorithms for information or and average link clustering algorithms. Since the star algorithm is also highly e#cient and simple
Garibaldi, Jon
is essential in reducing mortality rate. Fourier-transform infrared spectroscopy (FTIR) technology has been and other diseases [1-5]. This technology is based on Fourier-transform infrared spectroscopy. Different and Information Technology The University of Nottingham, United Kingdom {xyw,jmg,txo}@cs.nott.ac.uk Abstract
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Color segmentation using MDL clustering
NASA Astrophysics Data System (ADS)
Wallace, Richard S.; Suenaga, Yasuhito
1991-02-01
This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.
The ATLAS collaboration
2015-01-01
The properties of pixel clusters in dense environments are studied with $\\sqrt{s}$ = 13 TeV proton-proton collisions from the LHC, recorded by ATLAS from June to July 2015. A novel method to evaluate the performance of the artificial neural network used for identifying pixel clusters created by multiple particles is presented. Using this method, the results in data and Monte Carlo simulation are compared. The neural network, as part of the track reconstruction, shows the expected response when used on collimated tracks.
Comparing clustering and partitioning strategies
NASA Astrophysics Data System (ADS)
Afonso, Carlos; Ferreira, Fábio; Exposto, José; Pereira, Ana I.
2012-09-01
In this work we compare balance and edge-cut evaluation metrics to measure the performance of two wellknown graph data-grouping algorithms applied to four web and social network graphs. One of the algorithms employs a partitioning technique using Kmetis tool, and the other employs a clustering technique using Scluster tool. Because clustering algorithms use a similarity measure between each graph node and partitioning algorithms use a dissimilarity measure (weight), it was necessary to apply a normalized function to convert weighted graphs to similarity matrices.
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Hierarchical Dirichlet process model for gene expression clustering.
Wang, Liming; Wang, Xiaodong
2013-01-01
: Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
Evolutionary Monte Carlo Methods for Clustering
Wong, Wing Hung
-means clustering, variable selection) or a density (e.g., posterior from a Dirichlet process mixture model prior, K-means clustering, and the MCLUST algorithm. Key Words: Dirichlet process mixture model; Gibbs possible clusters. In some cases, such as the Bayesian Dirichlet process mixture models, this density
1 Interval Set Clustering of Web Users using Modified Kohonen Self-Organizing Maps based attempts have adapted the K-means clustering algorithm as well as genetic algorithms based on rough sets to find interval sets of clusters. The genetic algorithms based clustering may not be able to handle large
Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond
2014-01-01
When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
ECSAGO: Evolutionary Clustering with Self Adaptive Genetic Operators
ECSAGO: Evolutionary Clustering with Self Adaptive Genetic Operators Elizabeth Leon, Olfa Nasraoui, and Jonatan Gomez Abstract-- We present an algorithm for Evolutionary Clus- tering with Self Adaptive Genetic Evolutionary (HAEA) algorithms. The UNC is a genetic clustering algorithm that is robust to noise and is able
Grande, J A; Andújar, J M; Aroba, J; de la Torre, M L; Beltrán, R
2005-04-01
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity. PMID:15798799
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Muetterties, Earl L.
1980-05-01
Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.
Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus
Aslam, Javed
Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the on-line version of the star algorithm [JPR98, JPR99] as the clustering tool algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do
Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus
Aslam, Javed
Using Star Clusters for Filtering Javed Aslam Katya Pelekhov Daniela Rus Department of Computer to the filtering task. We use the onÂline version of the star algorithm [JPR98, JPR99] as the clustering tool#cient algorithm for organizing static and dynamic information by topic using the star cluster algorithm. We do
SIMPLE : A Methodology for Programming High Performance Algorithms on
Bader, David A.
SIMPLE : A Methodology for Programming High Performance Algorithms on Clusters of Symmetric describe a methodology for developing high performance programs running on clusters of SMP nodes. Our the optimal methodology for these platforms. Programming Message Passing Simple Programming Shared Memory
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance. PMID:21844626
Construction of the Ensemble of Logical Models in Cluster Analysis
NASA Astrophysics Data System (ADS)
Berikov, Vladimir
In this paper, the algorithm of cluster analysis based on the ensemble of tree-like logical models (decision trees) is proposed. During the construction of the ensemble, the algorithm takes into account distances between logical statements describing clusters. Besides, we consider some properties of the Bayes model of classification. These properties are used at the motivation of information-probabilistic criterion of clustering quality. The results of experimental studies demonstrate the effectiveness of the suggested algorithm.
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
Processing of rock core microtomography images: Using seven different machine learning algorithms
NASA Astrophysics Data System (ADS)
Chauhan, Swarup; Rühaak, Wolfram; Khan, Faisal; Enzmann, Frieder; Mielke, Philipp; Kersten, Michael; Sass, Ingo
2016-01-01
The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.
A HIERARCHICAL IMAGE SEGMENTATION ALGORITHM Wei Yu, Jason Fritts
Fritts, Jason
relationships, and an image segmentation algorithm that efficiently constructs this representation. The overall blobs. The segmentation algorithm builds this hierarchical representation using an efficient clustering effectively. 1. INTRODUCTION Object extraction via image segmentation plays a key role in many applications
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
DBRS: A Density-Based Spatial Clustering Method with Random Sampling
DBRS: A Density-Based Spatial Clustering Method with Random Sampling Xin Wang and Howard J}@cs.uregina.ca Abstract In this paper, we propose a novel density-based spatial clustering method called DBRS. The algorithm can identify clusters of widely varying shapes, clusters of varying densities, clusters which
Acceleration of the LBG algorithm
NASA Astrophysics Data System (ADS)
Wu, Xiaolin; Guan, Lian
1994-02-01
A concentric spherical search technique is proposed to speed up the clustering process in VQ design. A linear data structure is incorporated into the LBG algorithm to keep and update the information about the proximity among the codewords. This proximity information can significantly reduce the number of candidate codewords to be the closest to a given training vector. An improved k-means type VQ design algorithm is proposed based on the new search technique and the supporting data structure.
Hierarchical clustering using mutual information
NASA Astrophysics Data System (ADS)
Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P.
2005-04-01
We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
Clusters and Clusters of Clusters in Collisions
NASA Astrophysics Data System (ADS)
Manil, B.; Bernigaud, V.; Boduch, P.; Cassimi, A.; Kamalou, O.; Lenoir, J.; Maunoury, L.; Rangama, J.; Huber, B. A.; Jensen, J.; Schmidt, H. T.; Zettergren, H.; Cederquist, H.; Tomita, S.; Hvelplund, P.; Alvarado, F.; Bari, S.; Lecointre, A.; Schlathölter, T.
2006-11-01
Pure and mixed clusters of fullerenes (C60 and C70) as well as of nucleobases have been produced within a cluster aggregation source and have been multiply ionised in collisions with highly charged ions. Multiply charged clusters and the corresponding appearance sizes have been identified for charge states up to q=5. In the fullerene case, the dominant fragmentation channel leads to the emission of singly charged fullerenes. Furthermore, it is concluded that the fullerene clusters, which in their neutral state are insulators, bound only by weak van der Waals forces, become conducting as soon as being multiply charged. Thus, the excess charges turn out to be delocalised. This phenomenon is explained by a rapid charge transfer to neighboured fullerene molecules well in agreement with predictions of the conducting sphere model. In the case of biomolecular clusters, it is found that the aggregation probability varies strongly for different nucleobases. In some cases the cluster distributions show strong variations due to possible shell effects. In the case of mixed biomolecular clusters a strong enhancement is observed for those clusters containing a so called Watson-Crick pair, for example a dimer of thymine and adenine.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
arXiv:1012.3697v3[cs.DS]16Oct2012 Analysis of Agglomerative Clustering
to this problem (for all values of k) is the agglomerative clustering algorithm with the complete linkage strategy theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming) as well. Keywords: agglomerative clustering, hierarchical clustering, complete linkage, approximation guar
arXiv:1012.3697v4[cs.DS]7Mar2014 Analysis of Agglomerative Clustering
to this problem (for all values of k) is the agglomerative clustering algorithm with the complete linkage strategy theoretically. In this paper, we analyze the agglomerative complete linkage clustering algorithm. Assuming) as well. Keywords: agglomerative clustering, hierarchical clustering, complete linkage, approximation guar
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Completely Connected Clustered Graphs Sabine Cornelsen1
Brandes, Ulrik
Completely Connected Clustered Graphs Sabine Cornelsen1 and Dorothea Wagner2 1 Universit`a dell this criterion to test in quadratic time, whether a connected clustered graph is c-planar. A linear time algorithm that solves this problem was given by Dahlhaus [4]. A first attempt for testing whether certain
White Matter Supervoxel Segmentation by Axial DP-means Clustering
Laidlaw, David
Diffusion MR imaging enables the quantitative measurement of water molecule diffusion, which exhibits. They also explored the asymptotic relationship between mixture models and hard clustering algorithms [2]. These hard clustering algorithms tend to be more efficient, scalable, and easy to implement, at the cost
Misty Mountain clustering: application to fast unsupervised flow cytometry gating
2010-01-01
Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data. PMID:20932336
Algorithms for Gene Clustering Analysis on Genomes
Yi, Gang Man
2012-07-16
The increased availability of data in biological databases provides many opportunities for understanding biological processes through these data. As recent attention has shifted from sequence analysis to higher-level ...
APPROXIMATION ALGORITHMS FOR CLUSTERING A Dissertation
Swamy, Chaitanya
. These problems arise in a wide range of applications such as, plant or warehouse location problems, cache focus on a class of objective functions more commonly referred to as facility location problems of these problems, the uncapacitated facility location (UFL) prob- lem, we want to open facilities at some subset
APPROXIMATION ALGORITHMS FOR CLUSTERING A Dissertation
Swamy, Chaitanya
arise in a wide range of applications such as, plant or warehouse location problems, cache placement on a class of objective functions more commonly referred to as facility location problems. These problems of these problems, the uncapacitated facility location (UFL) prob lem, we want to open facilities at some subset
STRONG GRAVITATIONAL LENSING: BLUEPRINTS FOR GALAXY-CLUSTER CORE RECONSTRUCTION
Fournier, John J.F.
STRONG GRAVITATIONAL LENSING: BLUEPRINTS FOR GALAXY-CLUSTER CORE RECONSTRUCTION by PETER ROBERT on a well-studied collection of lensed objects in the galaxy-cluster MS 2137. Con dent in the algorithm, we apply it second time to predict the distribution of mass in the galaxy-cluster MS 1455 responsible
Detection and Analysis of Galaxy Clusters in SDSS Jordan Levy
Leka, K. D .
Detection and Analysis of Galaxy Clusters in SDSS Jordan Levy Dept. of Physics & Astronomy distribution of galaxies throughout the universe, showing that galaxies often form systems of clusters whose of galaxies, we detect galaxy clusters using a custom implementation of the HOP algorithm. Mass functions
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning
Boley, Daniel
Clustering Very Large Data Sets with Principal Direction Divisive Partitioning David Littau1 of very large data sets. We define a very large data set as a data set which will not fit into memory at once. Many clustering algorithms require that the data set be scanned many times during the clustering
Analysis of Agglomerative Clustering Marcel R. Ackermann1
with the complete linkage strategy. For decades this algorithm has been widely used by practitioners. However, it is not well studied theoretically. In this paper we analyze the agglomerative complete linkage clustering agglomerative clustering, hierarchical clustering, complete linkage, ap- proximation guarantees 1 Introduction
Spatial cluster detection using dynamic programming
2012-01-01
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
A Lightweight Classification Algorithm for Energy Conservation in Wireless Sensor Networks
Wang, Wenye
A Lightweight Classification Algorithm for Energy Conservation in Wireless Sensor Networks Nurcan a new algorithm of lightweight and dynamic classification. By this algorithm, energy consumption the communication inside and out- side of a cluster, and cluster heads also aggregate the received information
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering.
Lerato, Lerato; Niesler, Thomas
2015-01-01
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. PMID:26517376
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
2015-01-01
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. PMID:26517376
Data comparison algorithms for arms control treaty verification
Bieber, A.M. Jr.
1993-08-01
Arms control treaty verification measures often require comparison of measurements made on treaty-limited items (TLIs) with nominal or representative values in order to verify the nature or identity of the (TLIs) in question. This paper discusses some algorithms for comparing measurements on TLIs, including algorithms based on least-squares fitting techniques and multivariate algorithms based on cluster analysis and Mahalanobis distances.
Bipartite graph partitioning and data clustering
Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.
2001-05-07
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
Clustering of High Throughput Gene Expression Data
Pirim, Harun; Ek?io?lu, Burak; Perkins, Andy; Yüceer, Çetin
2012-01-01
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527
NASA Technical Reports Server (NTRS)
1999-01-01
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Cui, Xiaohui; Mueller, Frank; Zhang, Yongpeng; Potok, Thomas E
2010-01-01
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
NASA Astrophysics Data System (ADS)
Kneib, Jean-Paul; Natarajan, Priyamvada
2011-11-01
Clusters of galaxies are the most recently assembled, massive, bound structures in the Universe. As predicted by General Relativity, given their masses, clusters strongly deform space-time in their vicinity. Clusters act as some of the most powerful gravitational lenses in the Universe. Light rays traversing through clusters from distant sources are hence deflected, and the resulting images of these distant objects therefore appear distorted and magnified. Lensing by clusters occurs in two regimes, each with unique observational signatures. The strong lensing regime is characterized by effects readily seen by eye, namely, the production of giant arcs, multiple images, and arclets. The weak lensing regime is characterized by small deformations in the shapes of background galaxies only detectable statistically. Cluster lenses have been exploited successfully to address several important current questions in cosmology: (i) the study of the lens(es)—understanding cluster mass distributions and issues pertaining to cluster formation and evolution, as well as constraining the nature of dark matter; (ii) the study of the lensed objects—probing the properties of the background lensed galaxy population—which is statistically at higher redshifts and of lower intrinsic luminosity thus enabling the probing of galaxy formation at the earliest times right up to the Dark Ages; and (iii) the study of the geometry of the Universe—as the strength of lensing depends on the ratios of angular diameter distances between the lens, source and observer, lens deflections are sensitive to the value of cosmological parameters and offer a powerful geometric tool to probe Dark Energy. In this review, we present the basics of cluster lensing and provide a current status report of the field.
Jean-Paul Kneib; Priyamvada Natarajan
2012-02-03
Clusters of galaxies are the most recently assembled, massive, bound structures in the Universe. As predicted by General Relativity, given their masses, clusters strongly deform space-time in their vicinity. Clusters act as some of the most powerful gravitational lenses in the Universe. Light rays traversing through clusters from distant sources are hence deflected, and the resulting images of these distant objects therefore appear distorted and magnified. Lensing by clusters occurs in two regimes, each with unique observational signatures. The strong lensing regime is characterized by effects readily seen by eye, namely, the production of giant arcs, multiple-images, and arclets. The weak lensing regime is characterized by small deformations in the shapes of background galaxies only detectable statistically. Cluster lenses have been exploited successfully to address several important current questions in cosmology: (i) the study of the lens(es) - understanding cluster mass distributions and issues pertaining to cluster formation and evolution, as well as constraining the nature of dark matter; (ii) the study of the lensed objects - probing the properties of the background lensed galaxy population - which is statistically at higher redshifts and of lower intrinsic luminosity thus enabling the probing of galaxy formation at the earliest times right up to the Dark Ages; and (iii) the study of the geometry of the Universe - as the strength of lensing depends on the ratios of angular diameter distances between the lens, source and observer, lens deflections are sensitive to the value of cosmological parameters and offer a powerful geometric tool to probe Dark Energy. In this review, we present the basics of cluster lensing and provide a current status report of the field.
Automated variable weighting in k-means type clustering.
Huang, Joshua Zhexue; Ng, Michael K; Rong, Hongqiang; Li, Zichen
2005-05-01
This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data. PMID:15875789
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
Joint clustering of protein interaction networks through Markov random walk.
Wang, Yijie; Qian, Xiaoning
2014-01-01
Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis. PMID:24565376
Reactive Collision Avoidance Algorithm
NASA Technical Reports Server (NTRS)
Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred
2010-01-01
The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on-line. The optimal avoidance trajectory is implemented as a receding-horizon model predictive control law. Therefore, at each time step, the optimal avoidance trajectory is found and the first time step of its acceleration is applied. At the next time step of the control computer, the problem is re-solved and the new first time step is again applied. This continual updating allows the RCA algorithm to adapt to a colliding spacecraft that is making erratic course changes.
Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold
2014-12-01
In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. PMID:25233483
Robust Face Clustering Via Tensor Decomposition.
Cao, Xiaochun; Wei, Xingxing; Han, Yahong; Lin, Dongdai
2015-11-01
Face clustering is a key component either in image managements or video analysis. Wild human faces vary with the poses, expressions, and illumination changes. All kinds of noises, like block occlusions, random pixel corruptions, and various disguises may also destroy the consistency of faces referring to the same person. This motivates us to develop a robust face clustering algorithm that is less sensitive to these noises. To retain the underlying structured information within facial images, we use tensors to represent faces, and then accomplish the clustering task based on the tensor data. The proposed algorithm is called robust tensor clustering (RTC), which firstly finds a lower-rank approximation of the original tensor data using a L1 norm optimization function. Because L1 norm does not exaggerate the effect of noises compared with L2 norm, the minimization of the L1 norm approximation function makes RTC robust. Then, we compute high-order singular value decomposition of this approximate tensor to obtain the final clustering results. Different from traditional algorithms solving the approximation function with a greedy strategy, we utilize a nongreedy strategy to obtain a better solution. Experiments conducted on the benchmark facial datasets and gait sequences demonstrate that RTC has better performance than the state-of-the-art clustering algorithms and is more robust to noises. PMID:25546869
Multiple Manifold Clustering Using Curvature Constrained Path
Babaeian, Amir; Bayestehtashk, Alireza; Bandarabadi, Mojtaba
2015-01-01
The problem of multiple surface clustering is a challenging task, particularly when the surfaces intersect. Available methods such as Isomap fail to capture the true shape of the surface near by the intersection and result in incorrect clustering. The Isomap algorithm uses shortest path between points. The main draw back of the shortest path algorithm is due to the lack of curvature constrained where causes to have a path between points on different surfaces. In this paper we tackle this problem by imposing a curvature constraint to the shortest path algorithm used in Isomap. The algorithm chooses several landmark nodes at random and then checks whether there is a curvature constrained path between each landmark node and every other node in the neighborhood graph. We build a binary feature vector for each point where each entry represents the connectivity of that point to a particular landmark. Then the binary feature vectors could be used as a input of conventional clustering algorithm such as hierarchical clustering. We apply our method to simulated and some real datasets and show, it performs comparably to the best methods such as K-manifold and spectral multi-manifold clustering. PMID:26375819
Java implementation of Class Association Rule algorithms
Energy Science and Technology Software Center (ESTSC)
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore »a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less
A spatial dirichlet process mixture model for clustering population genetics data.
Reich, Brian J; Bondell, Howard D
2011-06-01
Identifying homogeneous groups of individuals is an important problem in population genetics. Recently, several methods have been proposed that exploit spatial information to improve clustering algorithms. In this article, we develop a Bayesian clustering algorithm based on the Dirichlet process prior that uses both genetic and spatial information to classify individuals into homogeneous clusters for further study. We study the performance of our method using a simulation study and use our model to cluster wolverines in Western Montana using microsatellite data. PMID:20825394
Sobel, E.; Lange, K.; O`Connell, J.R.
1996-12-31
Haplotyping is the logical process of inferring gene flow in a pedigree based on phenotyping results at a small number of genetic loci. This paper formalizes the haplotyping problem and suggests four algorithms for haplotype reconstruction. These algorithms range from exhaustive enumeration of all haplotype vectors to combinatorial optimization by simulated annealing. Application of the algorithms to published genetic analyses shows that manual haplotyping is often erroneous. Haplotyping is employed in screening pedigrees for phenotyping errors and in positional cloning of disease genes from conserved haplotypes in population isolates. 26 refs., 6 figs., 3 tabs.
Donchev, Todor I. (Urbana, IL); Petrov, Ivan G. (Champaign, IL)
2011-05-31
Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.
Submitted to IEEE Trans. on Robotics & Automation Ordinal Comparison of Heuristic Algorithms Using
Wu, David
Submitted to IEEE Trans. on Robotics & Automation 1 Ordinal Comparison of Heuristic Algorithms clustering algorithms in the IMSL library is conducted. The results demonstrate that our method can determine
Clustering memes in social media streams
JafariAsbagh, Mohsen; Varol, Onur; Menczer, Filippo; Flammini, Alessandro
2014-01-01
The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them o...
Global optimization method using SLE and adaptive RBF based on fuzzy clustering
NASA Astrophysics Data System (ADS)
Zhu, Huaguang; Liu, Li; Long, Teng; Zhao, Junfeng
2012-07-01
High fidelity analysis models, which are beneficial to improving the design quality, have been more and more widely utilized in the modern engineering design optimization problems. However, the high fidelity analysis models are so computationally expensive that the time required in design optimization is usually unacceptable. In order to improve the efficiency of optimization involving high fidelity analysis models, the optimization efficiency can be upgraded through applying surrogates to approximate the computationally expensive models, which can greately reduce the computation time. An efficient heuristic global optimization method using adaptive radial basis function (RBF) based on fuzzy clustering (ARFC) is proposed. In this method, a novel algorithm of maximin Latin hypercube design using successive local enumeration (SLE) is employed to obtain sample points with good performance in both space-filling and projective uniformity properties, which does a great deal of good to metamodels accuracy. RBF method is adopted for constructing the metamodels, and with the increasing the number of sample points the approximation accuracy of RBF is gradually enhanced. The fuzzy c-means clustering method is applied to identify the reduced attractive regions in the original design space. The numerical benchmark examples are used for validating the performance of ARFC. The results demonstrates that for most application examples the global optima are effectively obtained and comparison with adaptive response surface method (ARSM) proves that the proposed method can intuitively capture promising design regions and can efficiently identify the global or near-global design optimum. This method improves the efficiency and global convergence of the optimization problems, and gives a new optimization strategy for engineering design optimization problems involving computationally expensive models.
Archambeau, Cédric; Verleysen, Michel
2007-01-01
A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence. PMID:17011164
John C. Baez; Mike Stay
2013-02-25
Algorithmic entropy can be seen as a special case of entropy as studied in statistical mechanics. This viewpoint allows us to apply many techniques developed for use in thermodynamics to the subject of algorithmic information theory. In particular, suppose we fix a universal prefix-free Turing machine and let X be the set of programs that halt for this machine. Then we can regard X as a set of 'microstates', and treat any function on X as an 'observable'. For any collection of observables, we can study the Gibbs ensemble that maximizes entropy subject to constraints on expected values of these observables. We illustrate this by taking the log runtime, length, and output of a program as observables analogous to the energy E, volume V and number of molecules N in a container of gas. The conjugate variables of these observables allow us to define quantities which we call the 'algorithmic temperature' T, 'algorithmic pressure' P and algorithmic potential' mu, since they are analogous to the temperature, pressure and chemical potential. We derive an analogue of the fundamental thermodynamic relation dE = T dS - P d V + mu dN, and use it to study thermodynamic cycles analogous to those for heat engines. We also investigate the values of T, P and mu for which the partition function converges. At some points on the boundary of this domain of convergence, the partition function becomes uncomputable. Indeed, at these points the partition function itself has nontrivial algorithmic entropy.
Star Clusters Sterrenstelsels & Kosmos
Weijgaert, Rien van de
Star Clusters Sterrenstelsels & Kosmos deel 2 1 #12;Types of star clusters 2 #12;Open or Galactic Clusters · "Open" or Galactic clusters are low mass, relatively small (~10 pc diameter) clusters of stars in the Galactic disk containing stars · The Pleiades cluster is a good example of an open cluster
Catching Dissolving Clusters: A New Approach
NASA Astrophysics Data System (ADS)
Pellerin, A.; Meyer, M.; Harris, J.; Calzetti, D.
2008-06-01
Traditional studies of stellar clusters in external galaxies use surface photometry and therefore focus on systems that are still bright and compact enough to be separated from the stellar background. Consequently, the latter stages of unbound cluster evolution are still poorly understood. This dramatically constrains our knowledge of the dissolution processes of stellar clusters in various physical environments. We present the first results of a new approach to directly detect and quantify the characteristics of evolved stellar clusters. Using the exceptional spatial resolution and sensitivity of HST/ACS images to resolve the stellar content of nearby galaxies, we construct colour-magnitude diagrams for the observed fields. This enables us to separate the younger population likely present in young clusters from the older stellar content of the star field background. We utilize a clustering algorithm to assign each star to a group based on its local spatial density. This novel approach makes use of algorithms typically applied in N-body and cosmological studies. We test the method and show that it successfully detects less compact clusters that would normally be lost in the star field background. We also detect B-type stars well spread out in the galaxy disk of NGC 1313, probably the result of infant mortality of stellar clusters.
Deterministic algorithm with agglomerative heuristic for location problems
NASA Astrophysics Data System (ADS)
Kazakovtsev, L.; Stupina, A.
2015-10-01
Authors consider the clustering problem solved with the k-means method and p-median problem with various distance metrics. The p-median problem and the k-means problem as its special case are most popular models of the location theory. They are implemented for solving problems of clustering and many practically important logistic problems such as optimal factory or warehouse location, oil or gas wells, optimal drilling for oil offshore, steam generators in heavy oil fields. Authors propose new deterministic heuristic algorithm based on ideas of the Information Bottleneck Clustering and genetic algorithms with greedy heuristic. In this paper, results of running new algorithm on various data sets are given in comparison with known deterministic and stochastic methods. New algorithm is shown to be significantly faster than the Information Bottleneck Clustering method having analogous preciseness.
Adaptive pyramidal clustering for shortest path determination
NASA Astrophysics Data System (ADS)
Olson, Keith; Speigle, Scott A.
1996-05-01
This paper will present a unique concept implemented in a software design that determines near optimal paths between hundreds of randomly connected nodes of interest in a faster time than current near optimal path determining algorithms. The adaptive pyramidal clustering (APC) approach to determining near optimal paths between numerous nodes uses an adaptive neural network along with classical heuristic search techniques. This combination is represented by a nearest neighbor clustering up function (performed by the neural network) and a trickle down pruning function (performed by the heuristic search). The function of the adaptive neural network is a significant reason why the APC algorithm is superior to several well known approaches. The APC algorithm has already been applied to autonomous route planning for unmanned ground vehicles. The intersections represent navigational waypoints that can be selected as source and destination locations. The APC algorithm then determines a near optimal path to navigate between the selected waypoints.
Mercurio, D; Podofillini, L; Zio, E; Dang, V N
2009-11-01
This paper illustrates a method to identify and classify scenarios generated in a dynamic event tree (DET) analysis. Identification and classification are carried out by means of an evolutionary possibilistic fuzzy C-means clustering algorithm which takes into account not only the final system states but also the timing of the events and the process evolution. An application is considered with regards to the scenarios generated following a steam generator tube rupture in a nuclear power plant. The scenarios are generated by the accident dynamic simulator (ADS), coupled to a RELAP code that simulates the thermo-hydraulic behavior of the plant and to an operators' crew model, which simulates their cognitive and procedures-guided responses. A set of 60 scenarios has been generated by the ADS DET tool. The classification approach has grouped the 60 scenarios into 4 classes of dominant scenarios, one of which was not anticipated a priori but was "discovered" by the classifier. The proposed approach may be considered as a first effort towards the application of identification and classification approaches to scenarios post-processing for real-scale dynamic safety assessments. PMID:19819366
Segmentation of clustered nuclei with shape markers and marking function
Rajapakse, Jagath
We present a method to separate clustered nuclei from fluorescence microscopy cellular images, using shape markers and marking function in a watershed-like algorithm. Shape markers are extracted using an adaptive H-minima ...
Bayesian variable selection in clustering via dirichlet process mixture models
Kim, Sinae
2007-09-17
simultane- ously. I use Dirichlet process mixture models to define the cluster structure and to introduce in the model a latent binary vector to identify discriminating variables. I update the variable selection index using a Metropolis algorithm and obtain...
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning
NASA Astrophysics Data System (ADS)
Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff
2016-01-01
Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
Cui, Xiaohui; Potok, Thomas E
2006-01-01
The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the Kmeans or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a given set of data through the embedding of the highdimensional data items on a two-dimensional grid for efficient clustering result retrieval and visualization. In this paper, we propose a bio-inspired clustering model, the Multiple Species Flocking clustering model (MSF), and present a distributed multi-agent MSF approach for document clustering.
Spatio-Temporal Clustering of Monitoring Network
NASA Astrophysics Data System (ADS)
Hussain, I.; Pilz, J.
2009-04-01
Pakistan has much diversity in seasonal variation of different locations. Some areas are in desserts and remain very hot and waterless, for example coastal areas are situated along the Arabian Sea and have very warm season and a little rainfall. Some areas are covered with mountains, have very low temperature and heavy rainfall; for instance Karakoram ranges. The most important variables that have an impact on the climate are temperature, precipitation, humidity, wind speed and elevation. Furthermore, it is hard to find homogeneous regions in Pakistan with respect to climate variation. Identification of homogeneous regions in Pakistan can be useful in many aspects. It can be helpful for prediction of the climate in the sub-regions and for optimizing the number of monitoring sites. In the earlier literature no one tried to identify homogeneous regions of Pakistan with respect to climate variation. There are only a few papers about spatio-temporal clustering of monitoring network. Steinhaus (1956) presented the well-known K-means clustering method. It can identify a predefined number of clusters by iteratively assigning centriods to clusters based. Castro et al. (1997) developed a genetic heuristic algorithm to solve medoids based clustering. Their method is based on genetic recombination upon random assorting recombination. The suggested method is appropriate for clustering the attributes which have genetic characteristics. Sap and Awan (2005) presented a robust weighted kernel K-means algorithm incorporating spatial constraints for clustering climate data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data. Soltani and Modarres (2006) used hierarchical and divisive cluster analysis to categorize patterns of rainfall in Iran. They only considered rainfall at twenty-eight monitoring sites and concluded that eight clusters existed. Soltani and Modarres (2006) classified the sites by using only average rainfall of sites, they did not consider time replications and spatial coordinates. Kerby et.al (2007) purposed spatial clustering method based on likelihood. They took account of the geographic locations through the variance covariance matrix. Their purposed method works like hierarchical clustering methods. Moreovere, it is inappropiriate for time replication data and could not perform well for large number of sites. Tuia.et.al (2008) used scan statistics for identifying spatio-temporal clusters for fire sequences in the Tuscany region in Italy. The scan statistics clustering method was developed by Kulldorff et al. (1997) to detect spatio-temporal clusters in epidemiology and assessing their significance. The purposed scan statistics method is used only for univariate discrete stochastic random variables. In this paper we make use of a very simple approach for spatio-temporal clustering which can create separable and homogeneous clusters. Most of the clustering methods are based on Euclidean distances. It is well known that geographic coordinates are spherical coordinates and estimating Euclidean distances from spherical coordinates is inappropriate. As a transformation from geographic coordinates to rectangular (D-plane) coordinates we use the Lambert projection method. The partition around medoids clustering method is incorporated on the data including D-plane coordinates. Ordinary kriging is taken as validity measure for the precipitation data. The kriging results for clusters are more accurate and have less variation compared to complete monitoring network precipitation data. References Casto.V.E and Murray.A.T (1997). Spatial Clustering with Data Mining with Genetic Algorithms. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.8573 Kaufman.L and Rousseeuw.P.J (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley series of Probability and Mathematical Statistics, New York. Kulldorf.M (1997). A spatial scan statistic. Commun. Stat.-Theor. Math. 26(6)
Liu, Ming-Yu; Tuzel, Oncel; Ramalingam, Srikumar; Chellappa, Rama
2014-01-01
We propose a new objective function for clustering. This objective function consists of two components: the entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes and penalizes larger clusters that aggressively group samples. We present a novel graph construction for the graph associated with the data and show that this construction induces a matroid--a combinatorial structure that generalizes the concept of linear independence in vector spaces. The clustering result is given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting the submodular and monotonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of (1/2) for the optimality of the greedy solution. We validate the proposed algorithm on various benchmarks and show its competitive performances with respect to popular clustering algorithms. We further apply it for the task of superpixel segmentation. Experiments on the Berkeley segmentation data set reveal its superior performances over the state-of-the-art superpixel segmentation algorithms in all the standard evaluation metrics. PMID:24231869
Booth, Thomas E; Gubernatis, James E
2008-01-01
We present a procedure that in many cases enables the Monte Carlo sampling of states of a large system from the sampling of states of a smaller system. We illustrate this procedure, which we call the sewing algorithm, for sampling states from the transfer matrix of the two-dimensional Ising model.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Twin Support Vector Machine for Clustering.
Wang, Zhen; Shao, Yuan-Hai; Bai, Lan; Deng, Nai-Yang
2015-10-01
The twin support vector machine (TWSVM) is one of the powerful classification methods. In this brief, a TWSVM-type clustering method, called twin support vector clustering (TWSVC), is proposed. Our TWSVC includes both linear and nonlinear versions. It determines k cluster center planes by solving a series of quadratic programming problems. To make TWSVC more efficient and stable, an initialization algorithm based on the nearest neighbor graph is also suggested. The experimental results on several benchmark data sets have shown a comparable performance of our TWSVC. PMID:25576578
Meta-analytic clustering of molecular neuroimaging studies Finn Arup Nielsen(1,2), Claus Svarer(1),
- geting serotonin-2A and benzodiazepine receptors. The K-means algorithm consistently clustered the serotonin-2A studies, performed using altanserin and 5-I-R91150 tracers, into one cluster and benzodiazepine
CLUM: a cluster program for analyzing microarray data.
Irigoien, I; Fernandez, E; Vives, S; Arenas, C
2008-08-01
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. PMID:18825964
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.
Burke, J; Davison, D; Hide, W
1999-11-01
Several efforts are under way to condense single-read expressed sequence tags (ESTs) and full-length transcript data on a large scale by means of clustering or assembly. One goal of these projects is the construction of gene indices where transcripts are partitioned into index classes (or clusters) such that they are put into the same index class if and only if they represent the same gene. Accurate gene indexing facilitates gene expression studies and inexpensive and early partial gene sequence discovery through the assembly of ESTs that are derived from genes that have yet to be positionally cloned or obtained directly through genomic sequencing. We describe d2_cluster, an agglomerative algorithm for rapidly and accurately partitioning transcript databases into index classes by clustering sequences according to minimal linkage or "transitive closure" rules. We then evaluate the relative efficiency of d2_cluster with respect to other clustering tools. UniGene is chosen for comparison because of its high quality and wide acceptance. It is shown that although d2_cluster and UniGene produce results that are between 83% and 90% identical, the joining rate of d2_cluster is between 8% and 20% greater than UniGene. Finally, we present the first published rigorous evaluation of under and over clustering (in other words, of type I and type II errors) of a sequence clustering algorithm, although the existence of highly identical gene paralogs means that care must be taken in the interpretation of the type II error. Upper bounds for these d2_cluster error rates are estimated at 0.4% and 0.8%, respectively. In other words, the sensitivity and selectivity of d2_cluster are estimated to be >99.6% and 99.2%. PMID:10568753
ASteCA: Automated Stellar Cluster Analysis
NASA Astrophysics Data System (ADS)
Perren, G. I.; Vázquez, R. A.; Piatti, A. E.
2015-04-01
We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.
A Distributed Flocking Approach for Information Stream Clustering Analysis
Cui, Xiaohui; Potok, Thomas E
2006-01-01
Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.
Curve skeleton extraction by coupled graph contraction and surface clustering
Martin, Ralph R.
Curve skeleton extraction by coupled graph contraction and surface clustering Wei Jiang a , Kai Xu November 2012 Keywords: Curve skeleton Voronoi Graph contraction Surface clustering Deformation a b s t r a c t In this paper, we present a practical algorithm to extract a curve skeleton of a 3D shape
Clusterability Detection and Initial Seed Selection in Large Data Sets
Bystroff, Chris
Clusterability Detection and Initial Seed Selection in Large Data Sets Scott Epter \\Lambda Mukkai Krishnamoorthy Mohammed Zaki fepters,moorthy,zakig@cs.rpi.edu Paper # 368 Abstract The need for a preliminary useful seed input to a chosen clustering algorithm. We present a framework for the definition
Distant Cluster Search: Welcoming some Newcomers from the EIS
Catarina Lobo; Angela Iovino; Guido Chincarini
1999-11-26
We present a new automated cluster search algorithm, stressing its advantages relatively to others available in the literature. Applying it to the photometric data of the ESO Imaging Survey (EIS, Renzini & da Costa 1997) allowed us to produce, with a high degree of confidence, a new catalogue of cluster candidates up to z approximately equal to 1.1.
A Review of Monte Carlo Tests of Cluster Analysis.
ERIC Educational Resources Information Center
Milligan, Glenn W.
1981-01-01
Monte Carlo validation studies of clustering algorithms, including Ward's minimum variance hierarchical method, are reviewed. Caution concerning the uncritical selection of Ward's method for recovering cluster structure is advised. Alternative explanations for differential recovery performance are explored and recommendations are made for future…
Application of Simulated Annealing to Clustering Tuples in Databases.
ERIC Educational Resources Information Center
Bell, D. A.; And Others
1990-01-01
Investigates the value of applying principles derived from simulated annealing to clustering tuples in database design, and compares this technique with a graph-collapsing clustering method. It is concluded that, while the new method does give superior results, the expense involved in algorithm run time is prohibitive. (24 references) (CLB)
DNA MICROARRAY DATA CLUSTERING USING GROWING SELF ORGANIZING NETWORKS
Koprinska, Irena
DNA MICROARRAY DATA CLUSTERING USING GROWING SELF ORGANIZING NETWORKS Kim Jackson and Irena}@it.usyd.edu.au ABSTRACT Recent advances in DNA microarray technology have allowed biologists to simultaneously monitor Neural Gas) for clustering DNA microarray data, comparing them with traditional algorithms. 1
The global optimization of Morse clusters by potential energy transformations
Neumaier, Arnold
The global optimization of Morse clusters by potential energy transformations J. P. K. Doye Â· R. H to be modelled. Morse clusters provide a particularly tough test system for global optimization algorithms of a biomolecule. In particular, large values are very challenging and, until now, no unbiased global optimization
Improved Sampling Algorithms in Lattice QCD
Arjun Singh Gambhir; Kostas Orginos
2015-06-19
Reverse Monte Carlo (RMC) is an algorithm that incorporates stochastic modification of the action as part of the process that updates the fields in a Monte Carlo simulation. Such update moves have the potential of lowering or eliminating potential barriers that may cause inefficiencies in exploring the field configuration space. The highly successful Cluster algorithms for spin systems can be derived from the RMC framework. In this work we apply RMC ideas to pure gauge theory, aiming to alleviate the critical slowing down observed in the topological charge evolution as well as other long distance observables. We present various formulations of the basic idea and report on our numerical experiments with these algorithms.
Parallelizing Clustering of Geoscientific Data Sets using Data Streams Silvia Nittel
Nittel, Silvia
Parallelizing Clustering of Geoscientific Data Sets using Data Streams Silvia Nittel Spatial Computing data mining algorithms such as clustering on massive geospatial data sets is still not feasible paradigm. The so-called partial/merge k-means algorithm is implemented as a set of data stream operators
The Voronoi Tessellation Cluster Finder in 2 1 Dimensions
Soares-Santos, Marcelle; de Carvalho, Reinaldo R.; Annis, James; Gal, Roy R.; La Barbera, Francesco; Lopes, Paulo A.A.; Wechsler, Risa H.; Busha, Michael T.; Gerke, Brian F.; /SLAC /KIPAC, Menlo Park
2011-06-23
We present a detailed description of the Voronoi Tessellation (VT) cluster finder algorithm in 2+1 dimensions, which improves on past implementations of this technique. The need for cluster finder algorithms able to produce reliable cluster catalogs up to redshift 1 or beyond and down to 10{sup 13.5} solar masses is paramount especially in light of upcoming surveys aiming at cosmological constraints from galaxy cluster number counts. We build the VT in photometric redshift shells and use the two-point correlation function of the galaxies in the field to both determine the density threshold for detection of cluster candidates and to establish their significance. This allows us to detect clusters in a self-consistent way without any assumptions about their astrophysical properties. We apply the VT to mock catalogs which extend to redshift 1.4 reproducing the ?CDM cosmology and the clustering properties observed in the Sloan Digital Sky Survey data. An objective estimate of the cluster selection function in terms of the completeness and purity as a function of mass and redshift is as important as having a reliable cluster finder. We measure these quantities by matching the VT cluster catalog with the mock truth table. We show that the VT can produce a cluster catalog with completeness and purity >80% for the redshift range up to {approx}1 and mass range down to {approx}10{sup 13.5} solar masses.
BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data
Abdullah, Ahmed; Sabbir Alam, S.M.; Sultana, Munawar; Hossain, M. Anwar
2015-01-01
Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. PMID:26216453
Butterfly Mixing: Accelerating Incremental-Update Algorithms on Huasha Zhao
Canny, John
Butterfly Mixing: Accelerating Incremental-Update Algorithms on Clusters Huasha Zhao John Canny introduce and analyze butterfly mixing, an ap- proach which interleaves communication with computa- tion. We evaluate butterfly mixing on stochastic gradi- ent algorithms for logistic regression and SVM, on two
2015-01-01
Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893
Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
Gildea, Daniel
propose a similar deterministic anti- annealing based algorithm for the Dirichlet process mixture model problem of clustering large datasets with high dynamic range in cluster sizes. The Gaussian mixture model is a powerful model for data clustering (McLachlan & Peel, 2000). It models the data as a mixture of multiple
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Fuzzy and Crisp Mahalanobis Fixed Point Clusters
Hennig, Christian
Fuzzy and Crisp Mahalanobis Fixed Point Clusters Christian Hennig 1 Fachbereich Mathematik Â SPST of the Mahalanobis distance. Crisp FPCs (where outlyingness is defined with a 0Â1 weight function) are compared to fuzzy FPCs where outliers are smoothly downweighted. An algorithm to find substantial crisp and fuzzy
Spectral Imaging of Galaxy Clusters with Planck
NASA Astrophysics Data System (ADS)
Bourdin, H.; Mazzotta, P.; Rasia, E.
2015-12-01
The Sunyaev-Zeldovich (SZ) effect is a promising tool for detecting the presence of hot gas out to the galaxy cluster peripheries. We developed a spectral imaging algorithm dedicated to the SZ observations of nearby galaxy clusters with Planck, with the aim of revealing gas density anisotropies related to the filamentary accretion of materials, or pressure discontinuities induced by the propagation of shock fronts. To optimize an unavoidable trade-off between angular resolution and precision of the SZ flux measurements, the algorithm performs a multi-scale analysis of the SZ maps as well as of other extended components, such as the cosmic microwave background (CMB) anisotropies and the Galactic thermal dust. The demixing of the SZ signal is tackled through kernel-weighted likelihood maximizations. The CMB anisotropies are further analyzed through a wavelet analysis, while the Galactic foregrounds and SZ maps are analyzed via a curvelet analysis that best preserves their anisotropic details. The algorithm performance has been tested against mock observations of galaxy clusters obtained by simulating the Planck High Frequency Instrument and by pointing at a few characteristic positions in the sky. These tests suggest that Planck should easily allow us to detect filaments in the cluster peripheries and detect large-scale shocks in colliding galaxy clusters that feature favorable geometry.
Clustering-based limb identification for pressure ulcer risk assessment.
Baran Pouyan, M; Nourani, M; Pompeo, M
2015-08-01
Bedridden patients have a high risk of developing pressure ulcers. Risk assessment for pressure ulceration is critical for preventive care. For a reliable assessment, we need to identify and track the limbs continuously and accurately. In this paper, we propose a method to identify body limbs using a pressure mat. Three prevalent sleep postures (supine, left and right postures) are considered. Then, predefined number of limbs (body parts) are identified by applying Fuzzy C-Means (FCM) clustering on key attributes. We collected data from 10 adult subjects and achieved average accuracy of 93.2% for 10 limbs in supine and 7 limbs in left/right postures. PMID:26737228
CLUSTER HEAD SELECTION USING RF SIGNAL STRENGTH Scott Fazackerley, Alan Paeth, Ramon Lawrence
Lawrence, Ramon
for transmission in addi- tion to forming favourable clusters based on Received Signal Strength Indication (RSSI transmission phase. To address this issue, a RF signal strength algorithm based on link quality is presented, Cluster Head Selection, Link Length, Energy Efficiency, RSSI 1. INTRODUCTION Cluster head selection
Hybrid Clustering by Integrating Text and Citation based Graphs in Journal Database Analysis
Hybrid Clustering by Integrating Text and Citation based Graphs in Journal Database Analysis Xinhai to large scale journal database and immediate clustering task. On the hand, many graph partition algorithms.Glanzel@econ.kuleuven.ac.be Abstract We propose a hybrid clustering strategy by integrating heterogeneous information sources as graphs
Feature Selection and Gene Clustering from Gene Expression Data Pabitra Mitra
Mitra, Pabitra
with a tissue category or disease [4]. A related task is gene clustering or partition- ing of genes into wellFeature Selection and Gene Clustering from Gene Expression Data Pabitra Mitra Machine Intelligence@isical.ac.in Abstract In this article we describe an algorithm for feature se- lection and gene clustering from high
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe. This paper focuses on scheduling virtual machines in a compute cluster to reduce power consumption via and implementation of an efficient scheduling algorithm to allocate virtual machines in a DVFS-enabled cluster
Determining the Number of Clusters in a Data Set Without Graphical Interpretation
NASA Technical Reports Server (NTRS)
Aguirre, Nathan S.; Davies, Misty D.
2011-01-01
Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
NASA Astrophysics Data System (ADS)
Chen, Naijin
2013-03-01
Level Based Partitioning (LBP) algorithm, Cluster Based Partitioning (CBP) algorithm and Enhance Static List (ESL) temporal partitioning algorithm based on adjacent matrix and adjacent table are designed and implemented in this paper. Also partitioning time and memory occupation based on three algorithms are compared. Experiment results show LBP partitioning algorithm possesses the least partitioning time and better parallel character, as far as memory occupation and partitioning time are concerned, algorithms based on adjacent table have less partitioning time and less space memory occupation.
Quantum algorithms for supervised and unsupervised machine learning
Seth Lloyd; Masoud Mohseni; Patrick Rebentrost
2013-11-04
Machine-learning tasks frequently involve problems of manipulating and classifying large numbers of vectors in high-dimensional spaces. Classical algorithms for solving such problems typically take time polynomial in the number of vectors and the dimension of the space. Quantum computers are good at manipulating high-dimensional vectors in large tensor product spaces. This paper provides supervised and unsupervised quantum machine learning algorithms for cluster assignment and cluster finding. Quantum machine learning can take time logarithmic in both the number of vectors and their dimension, an exponential speed-up over classical algorithms.
Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem
NASA Astrophysics Data System (ADS)
Korayem, L.; Khorsid, M.; Kassem, S. S.
2015-05-01
The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.
Emonet, Thierry
-type embryos. (B) Mean straightness from three notum1a over-expressing embryos. (C) Mean straightness from three SU5402-treated embryos. In wild-type embryos, the DM has the straightest tracks. The straightness. (A) Means of four wild-type embryos. (B) Means of three notum1a over-expressing embryos. (C) Means
Euclid's Algorithm, Guass' Elimination and Buchberger's Algorithm
International Association for Cryptologic Research (IACR)
Euclid's Algorithm, Guass' Elimination and Buchberger's Algorithm Shaohua Zhang School of Mathematics, Shandong University, Jinan, Shandong, 250100, PRC Abstract: It is known that Euclid's algorithm of several positive integers, which can be regarded as the generalization of Euclid's algorithm. This enables
Overlapping clusters for distributed computation.
Mirrokni, Vahab; Andersen, Reid; Gleich, David F.
2010-11-01
Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initial partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.
An Extended Affinity Propagation Clustering Method Based on Different Data Density Types
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself. PMID:25685144
An extended affinity propagation clustering method based on different data density types.
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself. PMID:25685144
Optimized Hypergraph Clustering-based Network Security Log Mining*
NASA Astrophysics Data System (ADS)
Che, Jianhua; Lin, Weimin; Yu, Yong; Yao, Wei
With network's growth and popularization, network security experts are facing bigger and bigger network security log. Network security log is a kind of valuable and important information recording various network behaviors, and has the features of large-scale and high dimension. Therefore, how to analyze these network security log to enhance the security of network becomes the focus of many researchers. In this paper, we first design a frequent attack sequencebased hypergraph clustering algorithm to mine the network security log, and then improve this algorithm with a synthetic measure of hyperedge weight and two optimization functions of clustering result. The experimental results show that the synthetic measure and optimization functions can promote significantly the coverage and precision of clustering result. The optimized hypergraph clustering algorithm provides a data analyzing method for intrusion detecting and active forewarning of network.
A New Method of Open Cluster Membership Determination
NASA Astrophysics Data System (ADS)
Gao, Xin-hua; Chen, Li; Hou, Zhen-jie
2014-07-01
Membership determination is the key-important step to study open clusters, which can directly influence on the estimation of open clusters’ physical parameters. DBSCAN (Density Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm in data mining techniques. In this paper the DBSCAN algorithm has been used for the first time to make the membership determination of the open clusters NGC 6791 and M 67 (NGC 2682). Our results indicate that the DBSCAN algorithm can effectively eliminate the contamination of field stars. The obtained member stars of NGC 6791 exhibit clearly a doubled main-sequence structure in the color-magnitude diagram, implying that NGC 6791 may have a more complicated history of star formation and evolution. The clustering analysis of M67 indicates the presence of mass segregation, and the distinct relative motion between the central part and the outer part of the cluster. These results demonstrate that the DBSCAN algorithm is an effective method of membership determination, and that it has some advantages superior to the conventional kinematic method.
Projection effects in mass-selected galaxy-cluster samples
Reblinsky, K; Reblinsky, Katrin; Bartelmann, Matthias
1999-01-01
Selection of galaxy clusters by mass is now possible due to weak gravitational lensing effects. It is an important question then whether this type of selection reduces the projection effects prevalent in optically selected cluster samples. We address this question using simulated data, from which we construct synthetic cluster catalogues both with Abell's criterion and an aperture-mass estimator sensitive to gravitational tidal effects. The signal-to-noise ratio of the latter allows to some degree to control the properties of the cluster sample. For the first time, we apply the cluster-detection algorithm proposed by Schneider to large-scale structure simulations. We find that selection of clusters through weak gravitational lensing is more reliable in terms of completeness and spurious detections. Choosing the signal-to-noise threshold appropriately, the completeness can be increased up to 100%, and the fraction of spurious detections can significantly be reduced compared to Abell-selected cluster samples. W...
Yoo, Sung-Hoon; Oh, Sung-Kwun; Pedrycz, Witold
2015-09-01
In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard 'if-then' format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate. PMID:26163042
Closser, Kristina D; Ge, Qinghui; Mao, Yuezhi; Shao, Yihan; Head-Gordon, Martin
2015-12-01
We develop a local excited-state method, based on the configuration interaction singles (CIS) wave function, for large atomic and molecular clusters. This method exploits the properties of absolutely localized molecular orbitals (ALMOs), which strictly limits the total number of excitations, and results in formal scaling with the third power of the system size for computing the full spectrum of ALMO-CIS excited states. The derivation of the equations and design of the algorithm are discussed in detail, with particular emphasis on the computational scaling. Clusters containing ?500 atoms were used in evaluating the scaling, which agrees with the theoretical predictions, and the accuracy of the method is evaluated with respect to standard CIS. A pioneering application to the size dependence of the helium cluster spectrum is also presented for clusters of 25-231 atoms, the largest of which results in the computation of 2310 excited states per sampled cluster geometry. PMID:26609558
Simple and Effective Adaptive Routing Algorithms Using Multi-Layer Wormhole Networks
Zhu, Dakai
. On the other hand, adaptive routing algorithms adjust to these changes, but require more complex hardwareSimple and Effective Adaptive Routing Algorithms Using Multi-Layer Wormhole Networks Kyung Min Su in multicomputer systems, clusters, or chip multiprocessors (CMPs). Among various routing algorithms
New applications of the genetic algorithm for the interpretation of high-resolution spectra1
Nijmegen, University of
804 New applications of the genetic algorithm for the interpretation of high-resolution spectra1 W. An alternative approach is unassigned fits of the spectra using genetic algorithms (GAs) with special cost, genetic algorithm, biomolecules, structure, van der Waals clusters. Résumé : La spectroscopie électronique
Abelian non-global logarithms from soft gluon clustering
NASA Astrophysics Data System (ADS)
Kelley, Randall; Walsh, Jonathan R.; Zuberi, Saba
2012-09-01
Most recombination-style jet algorithms cluster soft gluons in a complex way. This leads to previously identified correlations in the soft gluon phase space and introduces logarithmic corrections to jet cross sections, which are known as clustering logarithms. The leading Abelian clustering logarithms occur at least at next-to leading logarithm (NLL) in the exponent of the distribution. Using the framework of Soft Collinear Effective Theory (SCET), we show that new clustering effects contributing at NLL arise at each order. While numerical resummation of clustering logs is possible, it is unlikely that they can be analytically resummed to NLL. Clustering logarithms make the anti-kT algorithm theoretically preferred, for which they are power suppressed. They can arise in Abelian and non-Abelian terms, and we calculate the Abelian clustering logarithms at O ( {?_s^2} ) for the jet mass distribution using the Cambridge/Aachen and kT algorithms, including jet radius dependence, which extends previous results. We find that clustering logarithms can be naturally thought of as a class of non-global logarithms, which have traditionally been tied to non-Abelian correlations in soft gluon emission.
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning
Ntampaka, M; Sutherland, D J; Fromenteau, S; Poczos, B; Schneider, J
2015-01-01
We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark's publicly-available N-body MDPL1 simulation, one with perfect galaxy cluster membership information and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with width = 0.87. Interlopers introduce additional scatter, significantly widening the error distribution further (width = 2.13). We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to...
Globular Clusters in nearby Galaxy Cluster
Michael Hilker
2002-10-21
The discovery of a large population of intra-cluster star clusters in the central region of the Hydra I and Centaurus galaxy clusters is presented. Based on deep VLT photometry (V,I), many star clusters have been identified not only around the early-type galaxies, but also in the intra-cluster field, as far as 250 kpc from the galaxy centres. These intra-cluster globulars are predominantly blue, with a siginificant fraction being even bluer than the metal-poor halo clusters around massive galaxies. When interpreted as a metallicity effect they would have iron abundances around [Fe/H] = -2 dex. However, they might also be relatively young clusters which could have formed in tidal tails during recent interactions of the central galaxies.
Do open clusters have distinguishable chemical signatures?
Blanco-Cuaresma, S; Heiter, U
2015-01-01
Past studies have already shown that stars in open clusters are chemically homogeneous (e.g. De Silva et al. 2006, 2007 and 2009). These results support the idea that stars born from the same giant molecular cloud should have the same chemical composition. In this context, the chemical tagging technique was proposed by Freeman et al. 2002. The principle is to recover disrupted stellar clusters by looking only to the stellar chemical composition. In order to evaluate the feasibility of this approach, it is necessary to test if we can distinguish between stars born from different molecular clouds. For this purpose, we studied the chemical composition of stars in 32 old and intermediate-age open clusters, and we applied machine learning algorithms to recover the original cluster by only considering the chemical signatures.
Do open clusters have distinguishable chemical signatures?
NASA Astrophysics Data System (ADS)
Blanco-Cuaresma, S.; Soubiran, C.; Heiter, U.
2014-07-01
Past studies have already shown that stars in open clusters are chemically homogeneous (e.g. De Silva et al. 2006, 2007 and 2009). These results support the idea that stars born from the same giant molecular cloud should have the same chemical composition. In this context, the chemical tagging technique was proposed by Freeman et al. (2002). The principle is to recover disrupted stellar clusters by looking only to the stellar chemical composition. In order to evaluate the feasibility of this approach, it is necessary to test if we can distinguish between stars born from different molecular clouds. For this purpose, we studied the chemical composition of stars in 32 old and intermediate-age open clusters, and we applied machine learning algorithms to recover the original cluster by only considering the chemical signatures.
Semiparametric binary model for clustered survival data
NASA Astrophysics Data System (ADS)
Arlin, Rifina; Ibrahim, Noor Akma; Arasan, Jayanthi; Bakar, Rizam Abu
2015-10-01
This paper considers a method to analyze semiparametric binary models for clustered survival data when the responses are correlated. We extend parametric generalized estimating equation (GEE) to semiparametric GEE by introducing smoothing spline into the model. A backfitting algorithm is used in the derivation of the estimating equation for the parametric and nonparametric components of a semiparametric binary covariate model. The properties of the estimates for both are evaluated using simulation studies. We investigated the effects of the strength of cluster correlation and censoring rates on properties of the parameters estimate. The effect of the number of clusters and cluster size are also discussed. Results show that the GEE-SS are consistent and efficient for parametric component and nonparametric component of semiparametric binary covariates.
Global optimization method for cluster structures
NASA Astrophysics Data System (ADS)
Wang, Yin; Zhuang, Jun; Ning, Xi-Jing
2008-08-01
A time-going-backward quasidynamics method is developed for global optimization of cluster structures, and its merits are examined by a simple classical mechanics model, indicating that the probability for the system to jump over high potential barriers by this method is much higher than that by common annealing methods. The method is then used to investigate the isomers of a Lennard-Jones cluster containing 38 atoms and the C60 cluster with the Brenner potential, and can easily give the most stable structures, which are difficult to obtain by common annealing methods. In addition, for small carbon clusters Cn (n=21 30) , most of the potential energies optimized by this method are much lower than those obtained by a genetic algorithm.
STAR CLUSTERS IN INTERACTING GALAXIES : FORMATION AND EVOLUTION
NASA Astrophysics Data System (ADS)
Maji, Moupiya; Zhu, Qirong; Li, Yuexing; Charlton, Jane C.
2014-06-01
Galaxy mergers are vast stellar nurseries where star clusters form, but the immense tidal disruption forces are also at work, which causes destruction of the cluster systems. This simultaneous formation and destruction of the clusters leads to selective survival of clusters. Here, we investigate the formation and evolution of star clusters in interacting galaxies using a suite of high-resolution smoothed particle hydrodynamics simulations. In the simulations, star clusters are identified using the cluster finding algorithm, the Amiga Halo Finder. We found that the mass function of the star clusters change significantly during the course of galaxy merger, and the peak of density probability distribution shifts from high mass to ~7x10^5 Msun at late times. Furthermore, we found that the survived clusters are those that have the highest specific binding energy. Our results suggest that high-mass clusters are more easily destroyed than the low-mass ones, and that the observed peak of globular cluster mass function at 3x10^5 Msun may be a result of such selective survival of clusters.
Sequence comparison on a cluster of workstations using the PVM system
Guan, X.; Mural, R.J.; Uberbacher, E.C.
1995-02-01
We have implemented a distributed sequence comparison algorithm on a cluster of workstations using the PVM paradigm. This implementation has achieved similar performance to the intel iPSC/860 Hypercube, a massively parallel computer. The distributed sequence comparison algorithm serves as a search tool for two Internet servers GRAIL and GENQUEST. This paper describes the implementation and the performance of the algorithm.
Exciton trapping on an energy staircase of random guest clusters in binary molecular crystals
NASA Astrophysics Data System (ADS)
Tränkle, E.; von Borczyskowski, C.; Brown, R.
1993-12-01
A stochastical model with exponential hopping rates and an energy staircase on guest cluster states is applied to chemically mixed crystals like p-dichlorobenzene (DCB) in p-dibromobenzene (DBB). The minimal process algorithm is used to calculate transport properties and populations of the clusters for four cluster topologies. Self-trapping on dimer clusters is found in three cases. For doping concentrations over 3% the agreement with DCB/DBB data is satisfactory for the three-dimensional topologies.
Sketch and Validate for Big Data Clustering
NASA Astrophysics Data System (ADS)
Traganitis, Panagiotis A.; Slavakis, Konstantinos; Giannakis, Georgios B.
2015-06-01
In response to the need for learning tools tuned to big data analytics, the present paper introduces a framework for efficient clustering of huge sets of (possibly high-dimensional) data. Building on random sampling and consensus (RANSAC) ideas pursued earlier in a different (computer vision) context for robust regression, a suite of novel dimensionality and set-reduction algorithms is developed. The advocated sketch-and-validate (SkeVa) family includes two algorithms that rely on K-means clustering per iteration on reduced number of dimensions and/or feature vectors: The first operates in a batch fashion, while the second sequential one offers computational efficiency and suitability with streaming modes of operation. For clustering even nonlinearly separable vectors, the SkeVa family offers also a member based on user-selected kernel functions. Further trading off performance for reduced complexity, a fourth member of the SkeVa family is based on a divergence criterion for selecting proper minimal subsets of feature variables and vectors, thus bypassing the need for K-means clustering per iteration. Extensive numerical tests on synthetic and real data sets highlight the potential of the proposed algorithms, and demonstrate their competitive performance relative to state-of-the-art random projection alternatives.
Non-Manhattan layout extraction algorithm
NASA Astrophysics Data System (ADS)
Satkhozhina, Aziza; Ahmadullin, Ildus; Allebach, Jan P.; Lin, Qian; Liu, Jerry; Tretter, Daniel; O'Brien-Strain, Eamonn; Hunter, Andrew
2013-03-01
Automated publishing requires large databases containing document page layout templates. The number of layout templates that need to be created and stored grows exponentially with the complexity of the document layouts. A better approach for automated publishing is to reuse layout templates of existing documents for the generation of new documents. In this paper, we present an algorithm for template extraction from a docu- ment page image. We use the cost-optimized segmentation algorithm (COS) to segment the image, and Voronoi decomposition to cluster the text regions. Then, we create a block image where each block represents a homo- geneous region of the document page. We construct a geometrical tree that describes the hierarchical structure of the document page. We also implement a font recognition algorithm to analyze the font of each text region. We present a detailed description of the algorithm and our preliminary results.
Srinivasan, Thenmozhi; Palanisamy, Balasubramanie
2015-01-01
Clusters of high-dimensional data techniques are emerging, according to data noisy and poor quality challenges. This paper has been developed to cluster data using high-dimensional similarity based PCM (SPCM), with ant colony optimization intelligence which is effective in clustering nonspatial data without getting knowledge about cluster number from the user. The PCM becomes similarity based by using mountain method with it. Though this is efficient clustering, it is checked for optimization using ant colony algorithm with swarm intelligence. Thus the scalable clustering technique is obtained and the evaluation results are checked with synthetic datasets. PMID:26495413
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
Possibilistic-clustering-based MR brain image segmentation with accurate initialization
NASA Astrophysics Data System (ADS)
Liao, Qingmin; Deng, Yingying; Dou, Weibei; Ruan, Su; Bloyet, Daniel
2004-01-01
Magnetic resonance image analysis by computer is useful to aid diagnosis of malady. We present in this paper a automatic segmentation method for principal brain tissues. It is based on the possibilistic clustering approach, which is an improved fuzzy c-means clustering method. In order to improve the efficiency of clustering process, the initial value problem is discussed and solved by combining with a histogram analysis method. Our method can automatically determine number of classes to cluster and the initial values for each class. It has been tested on a set of forty MR brain images with or without the presence of tumor. The experimental results showed that it is simple, rapid and robust to segment the principal brain tissues.
Mining Approximate Order Preserving Clusters in the Presence of Noise
Zhang, Mengsheng; Wang, Wei; Liu, Jinze
2010-01-01
Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression patterns can be represented by consistent orderings of attributes. Existing order preserving cluster models require all objects in a cluster have identical attribute order without deviation. However, real data are noisy due to measurement technology limitation and experimental variability which prohibits these strict models from revealing true clusters corrupted by noise. In this paper, we study the problem of revealing the order preserving clusters in the presence of noise. We propose a noise-tolerant model called approximate order preserving cluster (AOPC). Instead of requiring all objects in a cluster have identical attribute order, we require that (1) at least a certain fraction of the objects have identical attribute order; (2) other objects in the cluster may deviate from the consensus order by up to a certain fraction of attributes. We also propose an algorithm to mine AOPC. Experiments on gene expression data demonstrate the efficiency and effectiveness of our algorithm. PMID:20689652
NASA Astrophysics Data System (ADS)
Thong Kieu, Duy; Kepic, Anton
2015-04-01
Geophysical inversion produces very useful images of earth parameters; however, inversion results usually suffer from inherent non-uniqueness: many subsurface models with different structures and parameters can explain the measurements. To reduce the ambiguity, extra information about the earth's structure and physical properties is needed. This prior information can be extracted from geological principles, prior petrophysical information from well logs, and complementary information from other geophysical methods. Any technique used to constrain inversion should be able to integrate the prior information and to guide updating inversion process in terms of the geological model. In this research, we have adopted fuzzy c-means (FCM) clustering technique for this purpose. FCM is a clustering method that allows us to divide the model of physical parameters into a few clusters of representative values that also may relate to geological units based on the similarity of the geophysical properties. This exploits the fact that in many geological environments the earth is comprised of a few distinctive rock units with different physical properties. Therefore FCM can provide a platform to constrain geophysical inversion, and should tend to produce models that are geologically meaningful. FCM was incorporated in both separate and co-operative inversion processing of seismic and magnetotelluric (MT) data with petrophysical constraints. Using petrophysical information through FCM assists the inversion to build a reliable earth model. In this algorithm, FCM plays a role of guider; it uses the prior information to drive the model update process, and also forming an earth model filled with rocks units rather than smooth transitions when the boundary is in doubt. Where petrophysical information from well logs or core measurement is not locally available the cluster petrophysics may be solved for in inversion as well if some knowledge of how many distinctive geological exist. A better way to handle this limitation on prior petrophysical knowledge is to integrate complementary information of seismic reflection and MT data using a co-operative inversion process. This strategy can utilize the high resolution of seismic data to support low resolution of MT data and vice versa; the seismic reflection model, which lacks low frequency information, benefits from the general background produced by MT data. Hence, the most comprehensive process is to use co-operative inversion of seismic reflection and MT data constrained by petrophysics via FCM. Using synthetic examples, we show that our methods can effectively recover the true earth models. Separate inversion of seismic and MT using FCM produces significantly better results than a conventional inversion process. The co-operative inversion of seismic and MT data demonstrate that seismic data increases the effective resolution of MT data and in turn seismic impedance models benefit from the lower frequency data in the MT model. Key words: seismic inversion; magnetotelluric inversion; co-operative inversion; petrophysical constraint; fuzzy c-means
Fontana, W.
1990-12-13
In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.
The Swift AGN and Cluster Survey
NASA Astrophysics Data System (ADS)
Danae Griffin, Rhiannon; Dai, Xinyu; Kochanek, Christopher S.; Bregman, Joel N.; Nugent, Jenna
2016-01-01
The Swift active galactic nucleus (AGN) and Cluster Survey (SACS) uses 125 deg^2 of Swift X-ray Telescope serendipitous fields with variable depths surrounding X-ray bursts to provide a medium depth (4 × 10^-15 erg cm^-2 s^-1) and area survey filling the gap between deep, narrow Chandra/XMM-Newton surveys and wide, shallow ROSAT surveys. Here, we present the first two papers in a series of publications for SACS. In the first paper, we introduce our method and catalog of 22,563 point sources and 442 extended sources. We examine the number counts of the AGN and galaxy cluster populations. SACS provides excellent constraints on the AGN number counts at the bright end with negligible uncertainties due to cosmic variance, and these constraints are consistent with previous measurements. The depth and areal coverage of SACS is well suited for galaxy cluster surveys outside the local universe, reaching z ˜ 1 for massive clusters. In the second paper, we use Sloan Digital Sky Survey (SDSS) DR8 data to study the 203 extended SACS sources that are located within the SDSS footprint. We search for galaxy over-densities in 3-D space using SDSS galaxies and their photometric redshifts near the Swift galaxy cluster candidates. We find 103 Swift clusters with a > 3? over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmations as galaxy clusters. We present a series of cluster properties including the redshift, BCG magnitude, BCG-to-X-ray center offset, optical richness, X-ray luminosity and red sequences. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ? 0.3 and 80% complete for z ? 0.4, consistent with the survey depth of SDSS. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 2 and 1 matches in optical, X-ray and SZ catalogs, respectively, so the majority of these clusters are new detections. These analysis results suggest that our Swift cluster selection algorithm presented in our first paper has yielded a statistically well-defined cluster sample for further studying cluster evolution and cosmology.
3D cluster members and near-infrared distance of open cluster NGC 6819
NASA Astrophysics Data System (ADS)
Gao, Xin-Hua; Xu, Shou-Kun; Chen, Li
2015-12-01
In order to obtain clean members of the open cluster NGC 6819, the proper motions and radial velocities of 1691 stars are used to construct a three-dimensional (3D) velocity space. Based on the DBSCAN clustering algorithm, 537 3D cluster members are obtained. From the 537 3D cluster members, the average radial velocity and absolute proper motion of the cluster are Vr = +2.30 ± 0.04 km s?1 and (PMRA, PMDec) = (?2.5 ± 0.5, ?4.3 ± 0.5) mas yr?1, respectively. The proper motions, radial velocities, spatial positions and color-magnitude diagram of the 537 3D members indicate that our membership determination is effective. Among the 537 3D cluster members, 15 red clump giants can be easily identified by eye and are used as reliable standard candles for the distance estimate of the cluster. The distance modulus of the cluster is determined to be (m ? M)0 = 11.86 ± 0.05 mag (2355 ± 54 pc), which is quite consistent with published values. The uncertainty of our distance modulus is dominated by the intrinsic dispersion in the luminosities of red clump giants (? 0.04 mag).
Scatter/Gather Clustering: Flexibly Incorporating User Feedback to Steer Clustering Results.
Hossain, M S; Ojili, Praveen Kumar Reddy; Grimm, C; Muller, R; Watson, L T; Ramakrishnan, N
2012-12-01
Significant effort has been devoted to designing clustering algorithms that are responsive to user feedback or that incorporate prior domain knowledge in the form of constraints. However, users desire more expressive forms of interaction to influence clustering outcomes. In our experiences working with diverse application scientists, we have identified an interaction style scatter/gather clustering that helps users iteratively restructure clustering results to meet their expectations. As the names indicate, scatter and gather are dual primitives that describe whether clusters in a current segmentation should be broken up further or, alternatively, brought back together. By combining scatter and gather operations in a single step, we support very expressive dynamic restructurings of data. Scatter/gather clustering is implemented using a nonlinear optimization framework that achieves both locality of clusters and satisfaction of user-supplied constraints. We illustrate the use of our scatter/gather clustering approach in a visual analytic application to study baffle shapes in the bat biosonar (ears and nose) system. We demonstrate how domain experts are adept at supplying scatter/gather constraints, and how our framework incorporates these constraints effectively without requiring numerous instance-level constraints. PMID:26357192
Unsupervised classification of multivariate geostatistical data: Two algorithms
NASA Astrophysics Data System (ADS)
Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques
2015-12-01
With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
The structure of young star clusters
NASA Astrophysics Data System (ADS)
Gladwin, P. P.; Kitsionas, S.; Boffin, H. M. J.; Whitworth, A. P.
1999-01-01
In this paper we analyse and compare the clustering of young stars in Chamaeleon I and Taurus. We compute the mean surface density of companion stars N as a function of angular displacement theta from each star. We then fit N theta) with two simultaneous power laws, i.e. N(theta) ~ K_bintheta^-beta_bin + K_clutheta^-beta_clu. For Chamaeleon I, we obtain beta_bin= 1.97 +/- and beta_clu= 0.28 +/- 0.06, with the elbow at theta_elb~ 0 011 +/- 0 004. For Taurus, we obtain beta_bin= 2.02 +/- 0.04 and beta _clu= 0.87 +/- 0.01, with the elbow at theta _elb~ 0 013 +/- 0 003. For both star clusters the observational data make large (~ 5 sigma) systematic excursions from the best-fitting curve in the binary regime (theta < theta_elb). These excursions are visible also in the data used by Larson and Simon, and may be attributable to evolutionary effects of the types discussed recently by Nakajima et al. and Bate et al. In the clustering regime (theta > theta_elb) the data conform to the best-fitting curve very well, but the beta_clu values we obtain differ significantly from those obtained by other workers. These differences are due partly to the use of different samples, and partly to different methods of analysis. We also calculate the box dimensions for the two star clusters: for Chamaeleon I we obtain D_box~=1.51+/-0.12, and for Taurus D_box~=1.39+/-0.01. However, the limited dynamic range makes these estimates simply descriptors of the large-scale clustering, and not admissible evidence for fractality. We propose two algorithms for objectively generating maps of constant stellar surface density in young star clusters. Such maps are useful for comparison with molecular-line and dust-continuum maps of star-forming clouds, and with the results of numerical simulations of star formation. They are also useful because they retain information that is suppressed in the evaluation of N(theta). Algorithm I (SCATTER) uses a universal smoothing length, and therefore has a restricted dynamic range, but it is implicitly normalized. Algorithm II (GATHER) uses a local smoothing length, which gives it much greater dynamic range, but it has to be normalized explicitly. Both algorithms appear to capture well the features that the human eye sees. We are exploring ways of analysing such maps to discriminate between fractal structure and single-level clustering, and to determine the degree of central condensation in small-N clusters.
NASA Technical Reports Server (NTRS)
Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra
1989-01-01
In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.
Cluster analysis of simulated gravitational wave triggers using constrained validation clustering
NASA Astrophysics Data System (ADS)
Zhang, Ting; Tang, Lappoon; Mukherjee, Soma
2010-10-01
The data collected in the science run of LIGO calls for a thorough analysis of the glitches seen in the gravitational wave channels, as well as in the auxiliary and environmental channels. Rapid growth in size and number of available databases requires fast and accurate data mining algorithms for timely glitch analysis. The study presents a new technique in cluster analysis that we call constrained validation clustering (CV clustering) for mining patterns in gravitational wave burst triggers. The approach avoids using Gaussianity assumptions on data distribution, and was shown to outperform a state of the art in clustering -- G-means -- when K, the number of clusters, is unknown (Tang et. al., 08); experimental results suggested that Guassian mixture assumption can be too strong as a machine learning bias in mining gravitational wave data, evidenced by very severe overfitting of data by G-means. Our current focus is on upgrading CV clustering to utilizing random sampling and stochastic optimization techniques. Preliminary results indicate that such an enhancement can potentially bring about a forty fold increase in computational efficiency while suffering minor degrade in model quality. A current future direction is in further improving quality of models learned by the algorithm for making it an effective approach for real LIGO data analysis.
Rough hypercuboid based supervised clustering of miRNAs.
Paul, Sushmita; Vera, Julio
2015-07-01
The microRNAs are small, endogenous non-coding RNAs found in plants, animals, and some viruses, which function in RNA silencing and post-transcriptional regulation of gene expression. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this regard, a new clustering algorithm, termed as rough hypercuboid based supervised attribute clustering (RH-SAC), is proposed to find such groups of miRNAs. The proposed algorithm is based on the theory of rough set, which directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the new approach is demonstrated on several publicly available miRNA expression data sets using support vector machine. The so-called B.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. The association of the miRNA clusters to various biological pathways is also shown by doing pathway enrichment analysis. PMID:25996345
SimClust - A Program to Simulate Star Clusters
Deveikis, V; Stonkute, R; Bridzius, A; Vansevicius, V
2009-01-01
We present a program tool, SimClust, designed for Monte-Carlo modeling of star clusters. It populates the available stellar isochrones with stars according to the initial mass function and distributes stars randomly following the analytical surface number density profile. The tool is aimed at simulating realistic images of extragalactic star clusters and can be used to: (i) optimize object detection algorithms, (ii) perform artificial cluster tests for the analysis of star cluster surveys, and (iii) assess the stochastic effects introduced into photometric and structural parameters of clusters due to random distribution of luminous stars and non-uniform interstellar extinction. By applying SimClust, we have demonstrated a significant influence of stochastic effects on the determined photometric and structural parameters of low-mass star clusters in the M31 galaxy disk. The source code and examples are available at the SimClust website: http://www.astro.ff.vu.lt/software/simclust/
Young star clusters: Progenitors of globular clusters!?
P. Anders; U. Fritze--v. Alvensleben; R. de Grijs
2003-09-04
Star cluster formation is a major mode of star formation in the extreme conditions of interacting galaxies and violent starbursts. Young clusters are observed to form in a variety of such galaxies, a substantial number resembling the progenitors of globular clusters in mass and size, but with significantly enhanced metallicity. From studies of the metal-poor and metal-rich star cluster populations of galaxies, we can therefore learn about the violent star formation history of these galaxies, and eventually about galaxy formation and evolution. We present a new set of evolutionary synthesis models of our GALEV code, with special emphasis on the gaseous emission of presently forming star clusters, and a new tool to compare extensive model grids with multi-color broad-band observations to determine individual cluster masses, metallicities, ages and extinction values independently. First results for young star clusters in the dwarf starburst galaxy NGC 1569 are presented. The mass distributions determined for the young clusters give valuable input to dynamical star cluster system evolution models, regarding survival and destruction of clusters. We plan to investigate an age sequence of galaxy mergers to see dynamical destruction effects in process.
The Angular Clustering of Distant Galaxy Clusters
Casey Papovich
2007-12-12
We discuss the angular clustering of galaxy clusters at z > 1 selected within 50 sq. deg from the Spitzer Wide-Infrared Extragalactic survey. We employ a simple color selection to identify high redshift galaxies with no dependence on galaxy rest-frame optical color using Spitzer IRAC 3.6 and 4.5 micron photometry. The majority (>90%) of galaxies with z > 1.3 are identified with [3.6] - [4.5] > -0.1 AB mag. We identify candidate galaxy clusters at z > 1 by selecting overdensities of >26-28 objects with [3.6] - [4.5] > -0.1 mag within radii of 1.4 arcminutes, which corresponds to r galaxy clusters show strong angular clustering, with an angular correlation function represented by w(theta) = (3.1 +/- 0.5) theta^(-1.1 +/- 0.1) with theta in units of arcminutes over scales of 2-100 arcminutes. Assuming the redshift distribution of these galaxy clusters follows a fiducial model, these galaxy clusters have a spatial-clustering scale length r_0 = 22.4 +/- 3.6 Mpc/h, and a comoving number density n = 1.2 +/- 0.1 x 10^-6 (Mpc/h)^-3. The correlation scale length and number density of these objects are comparable to those of rich galaxy clusters at low redshift. The number density of these high-redshift clusters correspond to dark-matter halos larger than 3-5 x 10^13 h^-1 solar masses at z=1.5. Assuming the dark halos hosting these high-redshift clusters grow following Lambda-CDM models, these clusters will reside in halos larger than 1-2 x 10^14 h^-1 solar masses at z=0.2, comparable to rich galaxy clusters.
AptaCluster - A Method to Cluster HT-SELEX Aptamer Pools and Lessons from its Application
Hoinka, Jan; Berezhnoy, Alexey; Sauna, Zuben E.; Przytycka, Teresa M.
2014-01-01
Systematic Evolution of Ligands by EXponential Enrichment (SELEX) is a well established experimental procedure to identify aptamers - synthetic single-stranded (ribo)nucleic molecules that bind to a given molecular target. Recently, new sequencing technologies have revolutionized the SELEX protocol by allowing for deep sequencing of the selection pools after each cycle. The emergence of High Throughput SELEX (HT-SELEX) has opened the field to new computational opportunities and challenges that are yet to be addressed. To aid the analysis of the results of HT-SELEX and to advance the understanding of the selection process itself, we developed AptaCluster. This algorithm allows for an efficient clustering of whole HT-SELEX aptamer pools; a task that could not be accomplished with traditional clustering algorithms due to the enormous size of such datasets. We performed HT-SELEX with Interleukin 10 receptor alpha chain (IL-10RA) as the target molecule and used AptaCluster to analyze the resulting sequences. AptaCluster allowed for the first survey of the relationships between sequences in different selection rounds and revealed previously not appreciated properties of the SELEX protocol. As the first tool of this kind, AptaCluster enables novel ways to analyze and to optimize the HT-SELEX procedure. Our AptaCluster algorithm is available as a very fast multiprocessor implementation upon request. PMID:25558474
Ant colony optimization approaches to clustering of lung nodules from CT images.
Gopalakrishnan, Ravichandran C; Kuppusamy, Veerakumar
2014-01-01
Lung cancer is becoming a threat to mankind. Applying machine learning algorithms for detection and segmentation of irregular shaped lung nodules remains a remarkable milestone in CT scan image analysis research. In this paper, we apply ACO algorithm for lung nodule detection. We have compared the performance against three other algorithms, namely, Otsu algorithm, watershed algorithm, and global region based segmentation. In addition, we suggest a novel approach which involves variations of ACO, namely, refined ACO, logical ACO, and variant ACO. Variant ACO shows better reduction in false positives. In addition we propose black circular neighborhood approach to detect nodule centers from the edge detected image. Genetic algorithm based clustering is performed to cluster the nodules based on intensity, shape, and size. The performance of the overall approach is compared with hierarchical clustering to establish the improvisation in the proposed approach. PMID:25525455
Ant Colony Optimization Approaches to Clustering of Lung Nodules from CT Images
Gopalakrishnan, Ravichandran C.; Kuppusamy, Veerakumar
2014-01-01
Lung cancer is becoming a threat to mankind. Applying machine learning algorithms for detection and segmentation of irregular shaped lung nodules remains a remarkable milestone in CT scan image analysis research. In this paper, we apply ACO algorithm for lung nodule detection. We have compared the performance against three other algorithms, namely, Otsu algorithm, watershed algorithm, and global region based segmentation. In addition, we suggest a novel approach which involves variations of ACO, namely, refined ACO, logical ACO, and variant ACO. Variant ACO shows better reduction in false positives. In addition we propose black circular neighborhood approach to detect nodule centers from the edge detected image. Genetic algorithm based clustering is performed to cluster the nodules based on intensity, shape, and size. The performance of the overall approach is compared with hierarchical clustering to establish the improvisation in the proposed approach. PMID:25525455
Structure and Dynamics of the Coma Cluster
Matthew Colless; Andrew M. Dunn
1995-08-16
We examine the structure and dynamics of the galaxies in the Coma cluster using a catalog of 552 redshifts including 243 new measurements. The velocity distribution is shown to be non-Gaussian due to structure associated with the group of galaxies around NGC4839, 40 arcmin SW of the cluster core. We apply a mixture-modelling algorithm to the galaxy sample and obtain a robust partition into two subclusters which we use to examine the system's dynamics. We find that the late-type galaxies are freely-falling into a largely virialised cluster core dominated by early types. We obtain a virial mass for the main cluster in close agreement with the estimates derived from recent X-ray data. The mass of the NGC4839 group is about 5-10% the mass of the main cluster. Assuming the main cluster and the NGC4839 group follow a linear two-body orbit, the favored solution has the two clusters lying at 74 degrees to the line of sight at a true separation of 0.8 Mpc and moving together at 1700 km/s. The cluster core shows evidence of an ongoing merger between two subclusters centered in projection on the dominant galaxies NGC4874 and NGC4889 but offset in velocity by 300 km/s and 1100 km/s respectively. Combining these results with X-ray and radio observations, and an interpretation of the presence or lack of an extended halo around the dominant galaxies, we develop a merger history for the Coma cluster.
Vilalta, Ricardo
Using Representative-Based Clustering for Nearest Neighbor Dataset Editing Christoph F. Eick, vilalta}@cs.uh.edu Abstract The goal of dataset editing in instance-based learning is to remove objects-based clustering algorithms for nearest neighbor dataset editing. We term this approach supervised clustering
Liu, Ying; Navathe, Shamkant B; Civera, Jorge; Dasigi, Venu; Ram, Ashwin; Ciliax, Brian J; Dingledine, Ray
2005-01-01
Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations. PMID:17044165
Appearance of bulk-like motifs in Si, Ge, and Al clusters.
Lu, Wen-Cai; Wang, C Z; Zhao, Li-Zhen; Zhang, Wei; Qin, Wei; Ho, K M
2010-08-14
Using a genetic algorithm method to search for low-energy structures, we studied the evolution of structural motifs in Si, Ge, and Al clusters. We were able to observe how bulk-like structural motifs occur in these clusters as the size of the system increases, replacing structural motifs characteristic of clusters at smaller sizes. Si and Ge clusters adopt prolate structures at small sizes. While Si clusters switch to a spherical motif around the size of 30 atoms, Ge clusters exhibit plate-like motifs at the size of 40-atom clusters before transforming into more spherical shapes. For Al clusters, an ordered layered structural motif begins to appear at a relatively small cluster size around 25-27 atoms. PMID:20552120
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems
Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.
2014-01-01
This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
Variable selection in multivariate calibration based on clustering of variable concept.
Farrokhnia, Maryam; Karimi, Sadegh
2016-01-01
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. PMID:26703255
MASSCLEAN: MASSive CLuster Evolution and ANalysis package -- A new tool for stellar clusters
NASA Astrophysics Data System (ADS)
Popescu, Bogdan
2010-11-01
Stellar clusters are laboratories for stellar evolution. Their stellar content have an uniform age and chemical composition, but span a large mass interval. The majority of stars are born in clusters and end up in the general field population. An accurate characterization of stellar clusters could be used to built better models, from stellar evolution to the evolution of an entire galaxy. Regardless of the fact that they are so close, for many Milky Way clusters it is difficult to be observed because they are obscured by the dust in the disk of our Galaxy. The clusters from the Local Group and beyond are too distant, so only their integrated properties could be used most of the time. There is one way to analyze the observational data, to search for clusters, and to describe them: simulations. MASSCLEAN (MASSive CLuster Evolution and ANalysis) package was developed to provide a better characterization of Galactic clusters, to derive selection effects of current surveys, and to provide information about the extra-galactic clusters. Simulations of known Galactic clusters are used to get better constraints on their parameters, like mass, age, extinction, chemical composition and distance. This is the traditional way to describe the Galactic clusters, fitting the data using the available models. The difference is that MASSCLEAN simulations provide a consistent set of parameters. The majority of extra-galactic clusters are known only from their integrated properties, integrated magnitudes and colors. The current models for stellar populations are available only in the infinite mass limit. But the real clusters have a finite mass, and their integrated colors show a large dispersion (stochastic fluctuations). The description of the variation of integrated colors as a function of mass and age lead to the creation of MASSCLEANcolors database, based on 70 million Monte Carlo simulations. Since the entries in the database form a consistent set of integrated colors, integrated magnitudes, age and mass, they can be used to inprove the current age and mass determinations for extra-galactic clusters. The accuracy could be inproved only by solving for age and mass together, and this is done by MASSCLEANage. An introduction to stellar clusters is presented in Chapter 1. A general description of the MASSCLEAN package is presented in Chapter 2, including its Galactic and extra-galactic applications. New search algorithms for stellar clusters and derivation of selection effects for current surveys should be based on cluster simulations. The variation of the integrated colors for stellar clusters as a function of mass and age is presented in Chapter 3. The color dispersion is presented as standard deviation. The entire distribution of integrated colors is presented in Chapter 4, along with MASSCLEAN age and new values for age and mass for 37 LMC clusters. A larger population of LMC clusters with new estimations for age and mass is presented in Chapter 5. The new results for 600 clusters provide a new look to the LMC cluster population. The complete results for over 900 known clusters could be used to study the possible mass dependence of the cluster lifetime, dissolution time and infant mortality.
Survey on granularity clustering.
Ding, Shifei; Du, Mingjing; Zhu, Hong
2015-12-01
With the rapid development of uncertain artificial intelligent and the arrival of big data era, conventional clustering analysis and granular computing fail to satisfy the requirements of intelligent information processing in this new case. There is the essential relationship between granular computing and clustering analysis, so some researchers try to combine granular computing with clustering analysis. In the idea of granularity, the researchers expand the researches in clustering analysis and look for the best clustering results with the help of the basic theories and methods of granular computing. Granularity clustering method which is proposed and studied has attracted more and more attention. This paper firstly summarizes the background of granularity clustering and the intrinsic connection between granular computing and clustering analysis, and then mainly reviews the research status and various methods of granularity clustering. Finally, we analyze existing problem and propose further research. PMID:26557926
Global structual optimizations of surface systems with a genetic algorithm
Chuang, Feng-Chuan
2005-05-01
Global structural optimizations with a genetic algorithm were performed for atomic cluster and surface systems including aluminum atomic clusters, Si magic clusters on the Si(111) 7 x 7 surface, silicon high-index surfaces, and Ag-induced Si(111) reconstructions. First, the global structural optimizations of neutral aluminum clusters Al{sub n} (n up to 23) were performed using a genetic algorithm coupled with a tight-binding potential. Second, a genetic algorithm in combination with tight-binding and first-principles calculations were performed to study the structures of magic clusters on the Si(111) 7 x 7 surface. Extensive calculations show that the magic cluster observed in scanning tunneling microscopy (STM) experiments consist of eight Si atoms. Simulated STM images of the Si magic cluster exhibit a ring-like feature similar to STM experiments. Third, a genetic algorithm coupled with a highly optimized empirical potential were used to determine the lowest energy structure of high-index semiconductor surfaces. The lowest energy structures of Si(105) and Si(114) were determined successfully. The results of Si(105) and Si(114) are reported within the framework of highly optimized empirical potential and first-principles calculations. Finally, a genetic algorithm coupled with Si and Ag tight-binding potentials were used to search for Ag-induced Si(111) reconstructions at various Ag and Si coverages. The optimized structural models of {radical}3 x {radical}3, 3 x 1, and 5 x 2 phases were reported using first-principles calculations. A novel model is found to have lower surface energy than the proposed double-honeycomb chained (DHC) model both for Au/Si(111) 5 x 2 and Ag/Si(111) 5 x 2 systems.
d2_cluster: A Validated Method for Clustering EST and Full-Length cDNA Sequences
Burke, John; Davison, Dan; Hide, Winston
1999-01-01
Several efforts are under way to condense single-read expressed sequence tags (ESTs) and full-length transcript data on a large scale by means of clustering or assembly. One goal of these projects is the construction of gene indices where transcripts are partitioned into index classes (or clusters) such that they are put into the same index class if and only if they represent the same gene. Accurate gene indexing facilitates gene expression studies and inexpensive and early partial gene sequence discovery through the assembly of ESTs that are derived from genes that have yet to be positionally cloned or obtained directly through genomic sequencing. We describe d2_cluster, an agglomerative algorithm for rapidly and accurately partitioning transcript databases into index classes by clustering sequences according to minimal linkage or “transitive closure” rules. We then evaluate the relative efficiency of d2_cluster with respect to other clustering tools. UniGene is chosen for comparison because of its high quality and wide acceptance. It is shown that although d2_cluster and UniGene produce results that are between 83% and 90% identical, the joining rate of d2_cluster is between 8% and 20% greater than UniGene. Finally, we present the first published rigorous evaluation of under and over clustering (in other words, of type I and type II errors) of a sequence clustering algorithm, although the existence of highly identical gene paralogs means that care must be taken in the interpretation of the type II error. Upper bounds for these d2_cluster error rates are estimated at 0.4% and 0.8%, respectively. In other words, the sensitivity and selectivity of d2_cluster are estimated to be >99.6% and 99.2%. [Supplementary material to this paper may be found online at www.genome.org and at www.pangeasystems.com.] PMID:10568753
Progeny Clustering: A Method to Identify Biological Phenotypes
Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.
2015-01-01
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
An evaluation of centrality measures used in cluster analysis
NASA Astrophysics Data System (ADS)
Engström, Christopher; Silvestrov, Sergei
2014-12-01
Clustering of data into groups of similar objects plays an important part when analysing many types of data, especially when the datasets are large as they often are in for example bioinformatics, social networks and computational linguistics. Many clustering algorithms such as K-means and some types of hierarchical clustering need a number of centroids representing the 'center' of the clusters. The choice of centroids for the initial clusters often plays an important role in the quality of the clusters. Since a data point with a high centrality supposedly lies close to the 'center' of some cluster, this can be used to assign centroids rather than through some other method such as picking them at random. Some work have been done to evaluate the use of centrality measures such as degree, betweenness and eigenvector centrality in clustering algorithms. The aim of this article is to compare and evaluate the usefulness of a number of common centrality measures such as the above mentioned and others such as PageRank and related measures.
Randomized Algorithms Robert Elsasser
Elsässer, Robert
Randomized Algorithms Robert Els¨asser 20. April 2011 Program of the day: · Computing shortest paths Robert Els¨asser Universit¨at Paderborn Randomized Algorithms SS 11 0 #12;2. Computing Distances2 n) using a randomized algorithms. Robert Els¨asser Universit¨at Paderborn Randomized Algorithms SS
Randomized Algorithms Robert Elsasser
Elsässer, Robert
Randomized Algorithms Robert Els¨asser 27. April 2011 Program of the day: · Computing shortest paths Robert Els¨asser Universit¨at Paderborn Randomized Algorithms SS 11 0 #12;2. Computing Distances2 n) using a randomized algorithms. Robert Els¨asser Universit¨at Paderborn Randomized Algorithms SS
Graph Algorithms Robert Elsasser
Elsässer, Robert
Graph Algorithms Robert Els¨asser 08 December 2011 Program of the day: · Algorithms in interconnection net- works Robert Els¨asser Universit¨at Paderborn Graph Algorithms WS 11/12 0 #12;Hypercube a Hamiltonian cycle. Robert Els¨asser Universit¨at Paderborn Graph Algorithms WS 11/12 1 #12;Broadcasting
Euclid's Algorithm Andreas Klappenecker
Klappenecker, Andreas
Euclid's Algorithm Andreas Klappenecker CPSC 629 Analysis of Algorithms We begin this course by revisiting an algorithm that is more than 2000 years old. The algorithm was described in Book VII of Euclid's Elements, but it is certainly much older. You might recall that Euclid was one of the first faculty members
SAT Algorithms Oliver Kullmann
Vollmer, Heribert
SAT Algorithms Oliver Kullmann CNFs Clause-sets, assignments Properties Data structures algorithms Oliver Kullmann Computer Science Department University of Wales Swansea Complexity of Constraints Dagstuhl, October 3, 2006 SAT algorithms for boolean CNFs: Local search and DPLL #12;SAT Algorithms Oliver
Library of Continuation Algorithms
Energy Science and Technology Software Center (ESTSC)
2005-03-01
LOCA (Library of Continuation Algorithms) is scientific software written in C++ that provides advanced analysis tools for nonlinear systems. In particular, it provides parameter continuation algorithms. bifurcation tracking algorithms, and drivers for linear stability analysis. The algorithms are aimed at large-scale applications that use Newton?s method for their nonlinear solve.
Geist, G.A.; Howell, G.W.; Watkins, D.S.
1997-11-01
The BR algorithm, a new method for calculating the eigenvalues of an upper Hessenberg matrix, is introduced. It is a bulge-chasing algorithm like the QR algorithm, but, unlike the QR algorithm, it is well adapted to computing the eigenvalues of the narrowband, nearly tridiagonal matrices generated by the look-ahead Lanczos process. This paper describes the BR algorithm and gives numerical evidence that it works well in conjunction with the Lanczos process. On the biggest problems run so far, the BR algorithm beats the QR algorithm by a factor of 30--60 in computing time and a factor of over 100 in matrix storage space.
Automated intensity descent algorithm for interpretation of complex high-resolution mass spectra.
Chen, Li; Sze, Siu Kwan; Yang, He
2006-07-15
This paper describes a new automated intensity descent algorithm for analysis of complex high-resolution mass spectra. The algorithm has been successfully applied to interpret Fourier transform mass spectra of proteins; however, it should be generally applicable to complex high-resolution mass spectra of large molecules recorded by other instruments. The algorithm locates all possible isotopic clusters by a novel peak selection method and a robust cluster subtraction technique according to the order of descending peak intensity after global noise level estimation and baseline correction. The peak selection method speeds up charge state determination and isotopic cluster identification. A Lorentzian-based peak subtraction technique resolves overlapping clusters in high peak density regions. A noise flag value is introduced to minimize false positive isotopic clusters. Moreover, correlation coefficients and matching errors between the identified isotopic multiplets and the averagine isotopic abundance distribution are the criteria for real isotopic clusters. The best fitted averagine isotopic abundance distribution of each isotopic cluster determines the charge state and the monoisotopic mass. Three high-resolution mass spectra were interpreted by the program. The results show that the algorithm is fast in computational speed, robust in identification of overlapping clusters, and efficient in minimization of false positives. In approximately 2 min, the program identified 611 isotopic clusters for a plasma ECD spectrum of carbonic anhydrase. Among them, 50 new identified isotopic clusters, which were missed previously by other methods, have been discovered in the high peak density regions or as weak clusters by this algorithm. As a result, 18 additional new bond cleavages have been identified from the 50 new clusters of carbonic anhydrase. PMID:16841924
Vanishing point detection based on an artificial bee colony algorithm
NASA Astrophysics Data System (ADS)
Han, Lei; Fan, Tanghuai; Zheng, Shengnan; Huang, Chenrong
2015-05-01
Vanishing points (VPs) are crucial for inferring the three-dimensional structure of a scene and can be exploited in various computer vision applications. Previous VP detection algorithms have been proven effective but generally cannot guarantee a strong performance in both accuracy and computational time. We propose an artificial bee colony algorithm called dynamic clustering artificial bee colony (DCABC) that accurately and efficiently detects VPs in the image plane. The task is regarded as a dynamic line-clustering problem, and the line clusters are initialized by their orientation information. Inspired by the foraging behavior of bees, DCABC selects the clustering center and reclassifies the line segments based on a distance criterion until the terminating condition is met. The optimal line clusters determine the estimated VP. The dissimilarity among solutions is measured by the Hamming distance between two binary vectors, which simplifies the new solution construction. The performances of the proposed and existing algorithms are evaluated on the York Urban database. The results verify the efficiency and accuracy of our proposed algorithm.
Cluster fusion-fission dynamics in the Singapore stock exchange
NASA Astrophysics Data System (ADS)
Teh, Boon Kin; Cheong, Siew Ann
2015-10-01
In this paper, we investigate how the cross-correlations between stocks in the Singapore stock exchange (SGX) evolve over 2008 and 2009 within overlapping one-month time windows. In particular, we examine how these cross-correlations change before, during, and after the Sep-Oct 2008 Lehman Brothers Crisis. To do this, we extend the complete-linkage hierarchical clustering algorithm, to obtain robust clusters of stocks with stronger intracluster correlations, and weaker intercluster correlations. After we identify the robust clusters in all time windows, we visualize how these change in the form of a fusion-fission diagram. Such a diagram depicts graphically how the cluster sizes evolve, the exchange of stocks between clusters, as well as how strongly the clusters mix. From the fusion-fission diagram, we see a giant cluster growing and disintegrating in the SGX, up till the Lehman Brothers Crisis in September 2008 and the market crashes of October 2008. After the Lehman Brothers Crisis, clusters in the SGX remain small for few months before giant clusters emerge once again. In the aftermath of the crisis, we also find strong mixing of component stocks between clusters. As a result, the correlation between initially strongly-correlated pairs of stocks decay exponentially with average life time of about a month. These observations impact strongly how portfolios and trading strategies should be formulated.
Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering
Xiong, Xuejian
In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number ...
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.
Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei
2013-05-01
Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm. PMID:23520251
The theory of variational hybrid quantum-classical algorithms
McClean, Jarrod R; Babbush, Ryan; Aspuru-Guzik, Alán
2015-01-01
Many quantum algorithms have daunting resource requirements when compared to what is available today. To address this discrepancy, a quantum-classical hybrid optimization scheme known as "the quantum variational eigensolver" was developed with the philosophy that even minimal quantum resources could be made useful when used in conjunction with classical routines. In this work we extend the general theory of this algorithm and suggest algorithmic improvements for practical implementations. Specifically, we develop a variational adiabatic ansatz and explore unitary coupled cluster where we establish a connection from second order unitary coupled cluster to universal gate sets through relaxation of exponential splitting. We introduce the concept of quantum variational error suppression that allows some errors to be suppressed naturally in this algorithm on a pre-threshold quantum device. Additionally, we analyze truncation and correlated sampling in Hamiltonian averaging as ways to reduce the cost of this proced...
Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Kulis, Brian
2011-01-01
One of the many benefits of Bayesian nonparametric processes such as the Dirichlet process is that they can be used for modeling infinite mixture models, thus providing a flexible answer to the question of how many clusters exist in a data set. For the most part, such flexibility is currently lacking in techniques based on hard clustering, such as k-means, graph cuts, and Bregman hard clustering. For finite mixture models, there is a precise connection between k-means and mixtures of Gaussians, obtained by an appropriate limiting argument. In this paper, we apply a similar technique to an infinite mixture arising from the Dirichlet process (DP). We show that a Gibbs sampling algorithm for DP mixtures approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like objective that includes a penalty term based on the number of clusters. We generalize our analysis to the case of clustering multiple related data sets through a...
SAR image change detection using watershed and spectral clustering
NASA Astrophysics Data System (ADS)
Niu, Ruican; Jiao, L. C.; Wang, Guiting; Feng, Jie
2011-12-01
A new method of change detection in SAR images based on spectral clustering is presented in this paper. Spectral clustering is employed to extract change information from a pair images acquired on the same geographical area at different time. Watershed transform is applied to initially segment the big image into non-overlapped local regions, leading to reduce the complexity. Experiments results and system analysis confirm the effectiveness of the proposed algorithm.
Skin cells segmentation algorithm based on spectral angle and distance score
NASA Astrophysics Data System (ADS)
Li, Qingli; Chang, Li; Liu, Hongying; Zhou, Mei; Wang, Yiting; Guo, Fangmin
2015-11-01
In the diagnosis of skin diseases by analyzing histopathological images of skin sections, the automated segmentation of cells in the epidermis area is an important step. Light microscopy based traditional methods usually cannot generate satisfying segmentation results due to complicated skin structures and limited information of this kind of image. In this study, we use a molecular hyperspectral imaging system to observe skin sections and propose a spectral based algorithm to segment epithelial cells. Unlike pixel-wise segmentation methods, the proposed algorithm considers both the spectral angle and the distance score between the test and the reference spectrum for segmentation. The experimental results indicate that the proposed algorithm performs better than the K-Means, fuzzy C-means, and spectral angle mapper algorithms because it can identify pixels with similar spectral angle but a different spectral distance.
NASA Astrophysics Data System (ADS)
Qureshi, Kalim; Rashid, Haroon
In this paper we practically compared the performance of three blocked based parallel multiplication algorithms with simple fixed size runtime task scheduling strategy on homogeneous cluster of workstations. Parallel Virtual Machine (PVM) was used for this study.
Blocked inverted indices for exact clustering of large chemical spaces.
Thiel, Philipp; Sach-Peltason, Lisa; Ottmann, Christian; Kohlbacher, Oliver
2014-09-22
The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of the utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given data set. In contrast to other algorithms, it neither requires GPU computing nor yields a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm, we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days. PMID:25136755
Parallel Algorithms IV Topics: image analysis algorithms
Balasubramonian, Rajeev
the width of a pixel #12;9 Algorithm #12;10 Example #12;11 Convex Hull Â· For a set of points S in a plane, the convex hull is the smallest convex polygon that contains all points in S #12;12 Algorithm on an N-Cell Linear Array #12;13 Example for Lower Hull Point #12;14 Algorithm Complexity #12;15 Reducing Complexity
Extending applicability of cluster based pattern recognition with efficient approximation techniques
Martinez, R.F.; Osbourn, G.C.
1997-03-01
The fundamental goal of this research has been to improve computational efficiency of the Visually Empirical Region of Influence (VERI) based clustering and pattern recognition (PR) algorithms we developed in previous work. The original clustering algorithm, when applied to data sets with N points, ran in time proportional to N{sup 3} (denoted with the notation O (N{sup 3})), which limited the size of data sets it could find solutions for. Results generated from our original clustering algorithm were superior to commercial clustering packages. These results warranted our efforts to improve the runtimes of our algorithms. This report describes the new algorithms, advances and obstacles met in their development. The report gives qualitative and quantitative analysis of the improved algorithms performances. With the information in this report, an interested user can determine which algorithm is best for a given problem in clustering (2-D) or PR (K-D), and can estimate how long it will run using the runtime plots of the algorithms before using any software.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
Simulation of dynamic communication clusters in a factory environment
NASA Astrophysics Data System (ADS)
Isomoto, Kazunori; Johnson, Neil; Okada, Saburo; Fujita, Satoshi; Yamashita, Masafumi
1998-10-01
In a previous paper we proposed a new kind of factory system that uses mobile robots to transport parts and materials between assembly stations. In this system, which we call an `assembly network,' the factory's assembly stations operate independently of a centralized production plan and negotiate with each other (via message passing) for access to robots. Using this assembly network as a testbed, we are developing distributed algorithms to allow the stations to organize themselves into small `communication clusters.' In this paper we will describe the current clustering algorithm that we are using in more detail than in the previous paper.
Egocentric daily activity recognition via multitask clustering.
Yan, Yan; Ricci, Elisa; Liu, Gaowen; Sebe, Nicu
2015-10-01
Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers' life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods. PMID:26067371
Multi-robot Patrol via the Metropolis-Hastings Algorithm
Edwards, Matthew Ryan
2014-09-17
and weaknesses for a variety of approaches to the multi-robot patrolling problem, ranging from randomized and partitioning algorithms to approaches using heuristics and the concept of idleness. Objectives In this project, we propose the generation... and weak clusters of nodes, and 3) approximation to a distribution with strong and weak individual nodes (lines 12-20). These three Markov chain strategies are heavily influenced by the second input of Algorithm 4, which is the desired stationary...
Incremental Clustering Based on Swarm Intelligence Bo Liu , Jiuhui Pan Bob McKay
McKay, Robert Ian
, and maintaining clusters with a changing grid. The algorithm applies information entropy to model behaviors. Each agent computes the information entropy or pheromone concentration of the area surrounding it
Multi-model modeling and its application of urban sewage treatment based on clustering analysis
NASA Astrophysics Data System (ADS)
Ping, Y. U.
2015-07-01
For urban sewage treatment processes, a single model suffers from heavy burden calculation and poor accuracy. A modeling method founded on an advanced supervised k-means clustering algorithm is proposed. The method introduced in this paper is the cluster center initialization idea of CCIA algorithm into a classical k-means clustering algorithm is applied to corral the data into clusters or second clustering by judging a pre-set threshold value, and the least squares method is applied to construct the ARX sub-models, and the system model is then constructed by weighing all ARX sub-models. The proposed method is used to determine the ammonia concentration model for wastewater treatment system benchmark and the actual WWTP practice. Simulation and application research results show that the proposed method can be effectively used to fit the nonlinear characteristics of the system with high precision.
Cluster approach for quasicrystals
Jeong, H.; Steinhardt, P.J. )
1994-10-03
We consider a cluster picture'' to explain the structure and formation of quasicrystals. Quasicrystal ordering is attributed to a small set of low-energy atomic clusters which determine the state of minimum free energy. The nature of the ground state as [ital T][r arrow]0, depending upon the clusters and cluster energies, ranges from energetically stable and unique (as in Penrose tilings), to entropically stable and degenerate (as in random tilings).
Fast deterministic algorithm for EEE components classification
NASA Astrophysics Data System (ADS)
Kazakovtsev, L. A.; Antamoshkin, A. N.; Masich, I. S.
2015-10-01
Authors consider the problem of automatic classification of the electronic, electrical and electromechanical (EEE) components based on results of the test control. Electronic components of the same type used in a high- quality unit must be produced as a single production batch from a single batch of the raw materials. Data of the test control are used for splitting a shipped lot of the components into several classes representing the production batches. Methods such as k-means++ clustering or evolutionary algorithms combine local search and random search heuristics. The proposed fast algorithm returns a unique result for each data set. The result is comparatively precise. If the data processing is performed by the customer of the EEE components, this feature of the algorithm allows easy checking of the results by a producer or supplier.
Intergalactic Globular Clusters
Michael J. West; Patrick Cote; Henry C. Ferguson; Michael D. Gregg; Andres Jordan; Ronald O. Marzke; Nial R. Tanvir; Ted von Hippel
2003-09-23
We confirm and extend our previous detection of a population of intergalactic globular clusters in Abell 1185, and report the first discovery of an intergalactic globular cluster in the nearby Virgo cluster of galaxies. The numbers, colors and luminosities of these objects can place constraints on their origin, which in turn may yield new insights to the evolution of galaxies in dense environments.
Analyse d'algorithmes L'algorithme d'Euclide
Maume-Deschamps, Véronique
Analyse d'algorithmes L'algorithme d'Euclide Algorithmes de calcul de pgcd rapides Appendice Propriétés probabilistes de l'algorithme d'Euclide ; algorithmes rapides de calcul de pgcd. Véronique Maume Financière et d'Assurances (ISFA), Lyon 1. 1 / 70 #12;Analyse d'algorithmes L'algorithme d'Euclide