An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
NASA Astrophysics Data System (ADS)
Liu, Lifeng; Sun, Sam Zandong; Yu, Hongyu; Yue, Xingtong; Zhang, Dong
2016-06-01
Considering the fact that the fluid distribution in carbonate reservoir is very complicated and the existing fluid prediction methods are not able to produce ideal predicted results, this paper proposes a new fluid identification method in carbonate reservoir based on the modified Fuzzy C-Means (FCM) Clustering algorithm. Both initialization and globally optimum cluster center are produced by Chaotic Quantum Particle Swarm Optimization (CQPSO) algorithm, which can effectively avoid the disadvantage of sensitivity to initial values and easily falling into local convergence in the traditional FCM Clustering algorithm. Then, the modified algorithm is applied to fluid identification in the carbonate X area in Tarim Basin of China, and a mapping relation between fluid properties and pre-stack elastic parameters will be built in multi-dimensional space. It has been proven that this modified algorithm has a good ability of fuzzy cluster and its total coincidence rate of fluid prediction reaches 97.10%. Besides, the membership of different fluids can be accumulated to obtain respective probability, which can evaluate the uncertainty in fluid identification result.
NASA Astrophysics Data System (ADS)
Wang, Deguang; Han, Baochang; Huang, Ming
Computer forensics is the technology of applying computer technology to access, investigate and analysis the evidence of computer crime. It mainly include the process of determine and obtain digital evidence, analyze and take data, file and submit result. And the data analysis is the key link of computer forensics. As the complexity of real data and the characteristics of fuzzy, evidence analysis has been difficult to obtain the desired results. This paper applies fuzzy c-means clustering algorithm based on particle swarm optimization (FCMP) in computer forensics, and it can be more satisfactory results.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Study on 2D random medium inversion algorithm based on Fuzzy C-means Clustering theory
NASA Astrophysics Data System (ADS)
Xu, Z.; Zhu, P.; Gu, Y.; Yang, X.; Jiang, J.
2015-12-01
Abstract: In seismic exploration for metal deposits, the traditional seismic inversion method based on layered homogeneous medium theory seems difficult to inverse small scale inhomogeneity and spatial variation of the actual medium. The reason is that physical properties of actual medium are more likely random distribution rather than layered. Thus, it is necessary to investigate a random medium inversion algorithm. The velocity of 2D random medium can be described as a function of five parameters: the background velocity (V0), the standard deviation of velocity (σ), the horizontal and vertical autocorrelation lengths (A and B), and the autocorrelation angle (θ). In this study, we propose an inversion algorithm for random medium based on the Fuzzy C-means Clustering (FCM) theory, whose basic idea is that FCM is used to control the inversion process to move forward to the direction we desired by clustering the estimated parameters into groups. Our method can be divided into three steps: firstly, the three parameters (A, B, θ) are estimated from 2D post-stack seismic data using the non-stationary random medium parameter estimation method, and then the estimated parameters are clustered to different groups according to FCM; secondly, the initial random medium model is constructed with clustered groups and the rest two parameters (V0 and σ) obtained from the well logging data; at last, inversion of the random medium are conducted to obtain velocity, impedance and random medium parameters using the Conjugate Gradient Method. The inversion experiments of synthetic seismic data show that the velocity models inverted by our algorithm are close to the real velocity distribution and the boundary of different media can be distinguished clearly.Key words: random medium, inversion, FCM, parameter estimation
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
NASA Astrophysics Data System (ADS)
Choi, Hon-Chit; Wen, Lingfeng; Eberl, Stefan; Feng, Dagan
2006-03-01
Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K I), volume of distribution (V d) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K I-k 4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP I & BP II) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.
NASA Astrophysics Data System (ADS)
Ayvaz, M. Tamer
2007-11-01
This study proposes an inverse solution algorithm through which both the aquifer parameters and the zone structure of these parameters can be determined based on a given set of observations on piezometric heads. In the zone structure identification problem fuzzy c-means ( FCM) clustering method is used. The association of the zone structure with the transmissivity distribution is accomplished through an optimization model. The meta-heuristic harmony search ( HS) algorithm, which is conceptualized using the musical process of searching for a perfect state of harmony, is used as an optimization technique. The optimum parameter zone structure is identified based on three criteria which are the residual error, parameter uncertainty, and structure discrimination. A numerical example given in the literature is solved to demonstrate the performance of the proposed algorithm. Also, a sensitivity analysis is performed to test the performance of the HS algorithm for different sets of solution parameters. Results indicate that the proposed solution algorithm is an effective way in the simultaneous identification of aquifer parameters and their corresponding zone structures.
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network.
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-01-01
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor's location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes. PMID:27618061
NASA Astrophysics Data System (ADS)
Akinin, M. V.; Akinina, N. V.; Klochkov, A. Y.; Nikiforov, M. B.; Sokolova, A. V.
2015-05-01
The report reviewed the algorithm fuzzy c-means, performs image segmentation, give an estimate of the quality of his work on the criterion of Xie-Beni, contain the results of experimental studies of the algorithm in the context of solving the problem of drawing up detailed two-dimensional maps with the use of unmanned aerial vehicles. According to the results of the experiment concluded that the possibility of applying the algorithm in problems of decoding images obtained as a result of aerial photography. The considered algorithm can significantly break the original image into a plurality of segments (clusters) in a relatively short period of time, which is achieved by modification of the original k-means algorithm to work in a fuzzy task.
Elazab, Ahmed; AbdulAzeem, Yousry M; Wu, Shiqian; Hu, Qingmao
2016-03-17
Brain tissue segmentation from magnetic resonance (MR) images is an importance task for clinical use. The segmentation process becomes more challenging in the presence of noise, grayscale inhomogeneity, and other image artifacts. In this paper, we propose a robust kernelized local information fuzzy C-means clustering algorithm (RKLIFCM). It incorporates local information into the segmentation process (both grayscale and spatial) for more homogeneous segmentation. In addition, the Gaussian radial basis kernel function is adopted as a distance metric to replace the standard Euclidean distance. The main advantages of the new algorithm are: efficient utilization of local grayscale and spatial information, robustness to noise, ability to preserve image details, free from any parameter initialization, and with high speed as it runs on image histogram. We compared the proposed algorithm with 7 soft clustering algorithms that run on both image histogram and image pixels to segment brain MR images. Experimental results demonstrate that the proposed RKLIFCM algorithm is able to overcome the influence of noise and achieve higher segmentation accuracy with low computational complexity. PMID:27257884
NASA Astrophysics Data System (ADS)
Kentel, Elcin
2010-05-01
Estimating future river flows is essential in water resources planning and management. Artificial neural network (ANN) models have been extensively utilized for rainfall-runoff modeling in the last decade. One of the major weaknesses of artificial neural network models is that they may fail to generate good estimates for extreme events, i.e. events that do not occur at all or often enough in the training set. If reliable estimates can be distinguished from unreliable ones, the former can be used with greater confidence in planning and management of the water resources. A fuzzy c-means algorithm is developed to cluster the estimates of the artificial neural networks into reliable and less-reliable river flow values (Kentel, 2009). The proposed algorithm is only tested for a single case (i.e. Güvenç River, Turkey) and produced promising results. In this study, applicability of the fuzzy c-means algorithm for different catchments in Turkey is tested. Three flow gauging stations are selected at four different catchments in mid and south Turkey. First, an ANN is developed for each gauging station; then fuzzy c-means algorithm is used together with the outputs of ANN models to test the success of the clustering algorithm in identifying input vectors that are susceptible to produce unreliable estimates. Results obtained for 12 gauging stations are used to identify the drawbacks of fuzzy c-means algorithm and to suggest modifications to improve the algorithm. Key words: Future river flow estimation; Artificial Neural Network; fuzzy c-means clustering Kentel, E. (2009) "Estimation of River Flow by Artificial Neural Networks and Identification of Input Vectors Susceptible to Producing Unreliable Flow Estimates," Journal of Hydrology, 375, 481-488.
Elazab, Ahmed; Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin; Hu, Qingmao
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
SPEQTACLE: An automated generalized fuzzy C-means algorithm for tumor delineation in PET
Lapuyade-Lahorgue, Jérôme; Visvikis, Dimitris; Hatt, Mathieu; Pradier, Olivier; Cheze Le Rest, Catherine
2015-10-15
Purpose: Accurate tumor delineation in positron emission tomography (PET) images is crucial in oncology. Although recent methods achieved good results, there is still room for improvement regarding tumors with complex shapes, low signal-to-noise ratio, and high levels of uptake heterogeneity. Methods: The authors developed and evaluated an original clustering-based method called spatial positron emission quantification of tumor—Automatic Lp-norm estimation (SPEQTACLE), based on the fuzzy C-means (FCM) algorithm with a generalization exploiting a Hilbertian norm to more accurately account for the fuzzy and non-Gaussian distributions of PET images. An automatic and reproducible estimation scheme of the norm on an image-by-image basis was developed. Robustness was assessed by studying the consistency of results obtained on multiple acquisitions of the NEMA phantom on three different scanners with varying acquisition parameters. Accuracy was evaluated using classification errors (CEs) on simulated and clinical images. SPEQTACLE was compared to another FCM implementation, fuzzy local information C-means (FLICM) and fuzzy locally adaptive Bayesian (FLAB). Results: SPEQTACLE demonstrated a level of robustness similar to FLAB (variability of 14% ± 9% vs 14% ± 7%, p = 0.15) and higher than FLICM (45% ± 18%, p < 0.0001), and improved accuracy with lower CE (14% ± 11%) over both FLICM (29% ± 29%) and FLAB (22% ± 20%) on simulated images. Improvement was significant for the more challenging cases with CE of 17% ± 11% for SPEQTACLE vs 28% ± 22% for FLAB (p = 0.009) and 40% ± 35% for FLICM (p < 0.0001). For the clinical cases, SPEQTACLE outperformed FLAB and FLICM (15% ± 6% vs 37% ± 14% and 30% ± 17%, p < 0.004). Conclusions: SPEQTACLE benefitted from the fully automatic estimation of the norm on a case-by-case basis. This promising approach will be extended to multimodal images and multiclass estimation in future developments.
Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering
2012-01-01
Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet
Image watermarking using a dynamically weighted fuzzy c-means algorithm
NASA Astrophysics Data System (ADS)
Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon
2011-10-01
Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
NASA Astrophysics Data System (ADS)
Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Despotović, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried
2010-03-01
The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.
NASA Astrophysics Data System (ADS)
Rapstine, Thomas D.
Gravity gradiometry has been used as a geophysical tool to image salt structure in hydrocarbon exploration. The knowledge of the location, orientation, and spatial extent of salt bodies helps characterize possible petroleum prospects. Imaging around and underneath salt bodies can be challenging given the petrophysical properties and complicated geometry of salt. Methods for imaging beneath salt using seismic data exist but are often iterative and expensive, requiring a refinement of a velocity model at each iteration. Fortunately, the relatively strong density contrast between salt and background density structure pro- vides the opportunity for gravity gradiometry to be useful in exploration, especially when integrated with other geophysical data such as seismic. Quantitatively integrating multiple geophysical data is not trivial, but can improve the recovery of salt body geometry and petrophysical composition using inversion. This thesis provides two options for quantitatively integrating seismic, AGG, and petrophysical data that may aid the imaging of salt bodies. Both methods leverage and expand upon previously developed deterministic inversion methods. The inversion methods leverage seismically derived information, such as horizon slope and salt body interpretation, to constrain the inversion of airborne gravity gradiometry data (AGG) to arrive at a density contrast model. The first method involves constraining a top of salt inversion using slope in a seismic image. The second method expands fuzzy c-means (FCM) clustering inversion to include spatial control on clustering based on a seismically derived salt body interpretation. The effective- ness of the methods are illustrated on a 2D synthetic earth model derived from the SEAM Phase 1 salt model. Both methods show that constraining the inversion of AGG data using information derived from seismic images can improve the recovery of salt.
NASA Astrophysics Data System (ADS)
Polat, Kemal
2012-04-01
This study presents the application of fuzzy c-means (FCM) clustering-based feature weighting (FCMFW) for the detection of Parkinson's disease (PD). In the classification of PD dataset taken from University of California - Irvine machine learning database, practical values of the existing traditional and non-standard measures for distinguishing healthy people from people with PD by detecting dysphonia were applied to the input of FCMFW. The main aims of FCM clustering algorithm are both to transform from a linearly non-separable dataset to a linearly separable one and to increase the distinguishing performance between classes. The weighted PD dataset is presented to k-nearest neighbour (k-NN) classifier system. In the classification of PD, the various k-values in k-NN classifier were used and compared with each other. Also, the effects of k-values in k-NN classifier on the classification of Parkinson disease datasets have been investigated and the best k-value found. The experimental results have demonstrated that the combination of the proposed weighting method called FCMFW and k-NN classifier has obtained very promising results on the classification of PD.
NASA Astrophysics Data System (ADS)
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW-SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures
NASA Astrophysics Data System (ADS)
Wang, Shilong; Xu, Yuru; Pang, Yongjie
2011-03-01
The S/N of an underwater image is low and has a fuzzy edge. If using traditional methods to process it directly, the result is not satisfying. Though the traditional fuzzy C-means algorithm can sometimes divide the image into object and background, its time-consuming computation is often an obstacle. The mission of the vision system of an autonomous underwater vehicle (AUV) is to rapidly and exactly deal with the information about the object in a complex environment for the AUV to use the obtained result to execute the next task. So, by using the statistical characteristics of the gray image histogram, a fast and effective fuzzy C-means underwater image segmentation algorithm was presented. With the weighted histogram modifying the fuzzy membership, the above algorithm can not only cut down on a large amount of data processing and storage during the computation process compared with the traditional algorithm, so as to speed up the efficiency of the segmentation, but also improve the quality of underwater image segmentation. Finally, particle swarm optimization (PSO) described by the sine function was introduced to the algorithm mentioned above. It made up for the shortcomings that the FCM algorithm can not get the global optimal solution. Thus, on the one hand, it considers the global impact and achieves the local optimal solution, and on the other hand, further greatly increases the computing speed. Experimental results indicate that the novel algorithm can reach a better segmentation quality and the processing time of each image is reduced. They enhance efficiency and satisfy the requirements of a highly effective, real-time AUV.
A relational Fuzzy C-Means algorithm for detecting protein spots in two-dimensional gel images.
Rashwan, Shaheera; Faheem, Talaat; Sarhan, Amany; Youssef, Bayumy A B
2010-01-01
Two-dimensional polyacrylamide gel electrophoresis of proteins is a robust and reproducible technique. It is the most widely used separation tool in proteomics. Current efforts in the field are directed at the development of tools for expanding the range of proteins accessible with two-dimensional gels. Proteomics was built around the two-dimensional gel. The idea that multiple proteins can be analyzed in parallel grew from two-dimensional gel maps. Proteomics researchers needed to identify interested protein spots by examining the gel. This is time consuming, labor extensive and error prone. It is desired that the computer can analyze the proteins automatically by first detecting, then quantifying the protein spots in the 2D gel images. This paper focuses on the protein spot detection and segmentation of 2D gel electrophoresis images. We present a new technique for segmentation of 2D gel images using the Fuzzy C-Means (FCM) algorithm and matching spots using the notion of fuzzy relations. Through the experimental results, the new algorithm was found out to detect protein spots more accurately, then the current known algorithms. PMID:20865504
NASA Astrophysics Data System (ADS)
Ghaffarian, Saman; Gökaşar, Ilgın
2016-01-01
This study presents an approach for the automatic detection of vehicles using very high-resolution images and road vector data. Initially, road vector data and aerial images are integrated to extract road regions. Then, the extracted road/street region is clustered using an automatic histogram-based fuzzy C-means algorithm, and edge pixels are detected using the Canny edge detector. In order to automatically detect vehicles, we developed a local perceptual grouping approach based on fusion of edge detection and clustering outputs. To provide the locality, an ellipse is generated using characteristics of the candidate clusters individually. Then, ratio of edge pixels to nonedge pixels in the corresponding ellipse is computed to distinguish the vehicles. Finally, a point-merging rule is conducted to merge the points that satisfy a predefined threshold and are supposed to denote the same vehicles. The experimental validation of the proposed method was carried out on six very high-resolution aerial images that illustrate two highways, two shadowed roads, a crowded narrow street, and a street in a dense urban area with crowded parked vehicles. The evaluation of the results shows that our proposed method performed 86% and 83% in overall correctness and completeness, respectively.
Tang, Jing Rui; Mat Isa, Nor Ashidi; Ch’ng, Ewe Seng
2015-01-01
Despite the effectiveness of Pap-smear test in reducing the mortality rate due to cervical cancer, the criteria of the reporting standard of the Pap-smear test are mostly qualitative in nature. This study addresses the issue on how to define the criteria in a more quantitative and definite term. A negative Pap-smear test result, i.e. negative for intraepithelial lesion or malignancy (NILM), is qualitatively defined to have evenly distributed, finely granular chromatin in the nuclei of cervical squamous cells. To quantify this chromatin pattern, this study employed Fuzzy C-Means clustering as the segmentation technique, enabling different degrees of chromatin segmentation to be performed on sample images of non-neoplastic squamous cells. From the simulation results, a model representing the chromatin distribution of non-neoplastic cervical squamous cell is constructed with the following quantitative characteristics: at the best representative sensitivity level 4 based on statistical analysis and human experts’ feedbacks, a nucleus of non-neoplastic squamous cell has an average of 67 chromatins with a total area of 10.827μm2; the average distance between the nearest chromatin pair is 0.508μm and the average eccentricity of the chromatin is 0.47. PMID:26560331
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then
Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-01-01
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which
Cluster algorithms and computational complexity
NASA Astrophysics Data System (ADS)
Li, Xuenan
Cluster algorithms for the 2D Ising model with a staggered field have been studied and a new cluster algorithm for path sampling has been worked out. The complexity properties of Bak-Seppen model and the Growing network model have been studied by using the Computational Complexity Theory. The dynamic critical behavior of the two-replica cluster algorithm is studied. Several versions of the algorithm are applied to the two-dimensional, square lattice Ising model with a staggered field. The dynamic exponent for the full algorithm is found to be less than 0.5. It is found that odd translations of one replica with respect to the other together with global flips are essential for obtaining a small value of the dynamic exponent. The path sampling problem for the 1D Ising model is studied using both a local algorithm and a novel cluster algorithm. The local algorithm is extremely inefficient at low temperature, where the integrated autocorrelation time is found to be proportional to the fourth power of correlation length. The dynamic exponent of the cluster algorithm is found to be zero and therefore proved to be much more efficient than the local algorithm. The parallel computational complexity of the Bak-Sneppen evolution model is studied. It is shown that Bak-Sneppen histories can be generated by a massively parallel computer in a time that is polylog in the length of the history, which means that the logical depth of producing a Bak-Sneppen history is exponentially less than the length of the history. The parallel dynamics for generating Bak-Sneppen histories is contrasted to standard Bak-Sneppen dynamics. The parallel computational complexity of the Growing Network model is studied. The growth of the network with linear kernels is shown to be not complex and an algorithm with polylog parallel running time is found. The growth of the network with gamma ≥ 2 super-linear kernels can be realized by a randomized parallel algorithm with polylog expected running time.
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
HAMEDIAN, Amir Abbas; JAVID, Allahbakhsh; MOTESADDI ZARANDI, Saeed; RASHIDI, Yousef; MAJLESI, Monireh
2016-01-01
Background: Since the industrial revolution, the rate of industrialization and urbanization has increased dramatically. Regarding this issue, specific regions mostly located in developing countries have been confronted with serious problems, particularly environmental problems among which air pollution is of high importance. Methods: Eleven parameters, including CO, SO2, PM10, PM2.5, O3, NO2, benzene, toluene, ethyl-benzene, xylene, and 1,3-butadiene, have been accounted over a period of two years (2011–2012) from five monitoring stations located at Tehran, Iran, were assessed by using fuzzy inference system and fuzzy c-mean clustering. Results: These tools showed that the quality of criteria pollutants between the year 2011 and 2012 did not as much effect the public health as the other pollutants did. Conclusion: Using the air EPA AQI, the quality of air, and also the managerial plans required to improve the quality can be misled. PMID:27516999
Fuzzy-Kohonen-clustering neural network trained by genetic algorithm and fuzzy competition learning
NASA Astrophysics Data System (ADS)
Xie, Weixing; Li, Wenhua; Gao, Xinbo
1995-08-01
Kohonen networks are well known for clustering analysis. Classical Kohonen networks for hard c-means clustering (trained by winner-take-all learning) have some severe drawbacks. Fuzzy Kohonen networks (FKCNN) for fuzzy c-means clustering are trained by fuzzy competition learning, and can get better clustering results than the classical Kohonen networks. However, both winner-take-all and fuzzy competition learning algorithms are in essence local search techniques that search for the optimum by using a hill-climbing technique. Thus, they often fail in the search for the global optimum. In this paper we combine genetic algorithms (GAs) with fuzzy competition learning to train the FKCNN. Our experimental results show that the proposed GA/FC learning algorithm has much higher probabilities of finding the global optimal solutions than either the winner-take-all or the fuzzy competition learning.
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
NASA Astrophysics Data System (ADS)
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC
Hierarchical link clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples.
Hierarchical link clustering algorithm in networks.
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples. PMID:26172761
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Parallel Clustering Algorithms for Structured AMR
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.
Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A
2012-02-01
Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method. PMID:20703716
Fusion and clustering algorithms for spatial data
NASA Astrophysics Data System (ADS)
Kuntala, Pavani
Spatial clustering is an approach for discovering groups of related data points in spatial data. Spatial clustering has attracted a lot of research attention due to various applications where it is needed. It holds practical importance in application domains such as geographic knowledge discovery, sensors, rare disease discovery, astronomy, remote sensing, and so on. The motivation for this work stems from the limitations of the existing spatial clustering methods. In most conventional spatial clustering algorithms, the similarity measurement mainly considers the geometric attributes. However, in many real applications, users are concerned about both the spatial and the non-spatial attributes. In conventional spatial clustering, the input data set is partitioned into several compact regions and data points that are similar to one another in their non-spatial attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. In this dissertation, a novel clustering methodology is proposed to explore the clustering problem within both spatial and non-spatial domains by employing a fusion-based approach. The goal is to optimize a given objective function in the spatial domain, while satisfying the constraint specified in the non- spatial attribute domain. Several experiments are conducted to provide insights into the proposed methodology. The algorithm first captures the spatial cores having the highest structure and then employs an iterative, heuristic mechanism to find the optimal number of spatial cores and non-spatial clusters that exist in the data. Such a fusion-based framework allows for the handling of data streams and provides a framework for comparing spatial clusters. The correctness and efficiency of the proposed clustering model is demonstrated on real world and synthetic data sets.
Genetic algorithm optimization of atomic clusters
Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. |
1996-12-31
The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
NASA Astrophysics Data System (ADS)
Oh, J.; Kim, W.; Doh, S.; Yu, Y.; Shin, J.
2009-12-01
The Southern shelf of the Korean Peninsula has three potential sediment sources: China, Japan and Korea. However, dominance of clay and/or silt grained sediments causes ambiguity in the allocation of sediment sources and sedimentation features by conventional sedimentological approach. As an alternative analytic method, mineral magnetic techniques have been widely used to sediment provenances that are highly sensitive to micron-scale mineral fractions. In this study, various mineral magnetic parameters representing grain-size, magnetic concentration, and magnetic mineralogy were obtained from 480 sediment samples collected from 98 regularly spaced spots in the southern shelf of Korea. For the multivariate fuzzy cluster analysis, six parameters independent of magnetic mineral concentration were used. The interpretive routine suggests that the best statistical solution for the samples is four clusters. In addition, the solution shows a prominent spatial discrimination based on the magnetic data set. Samples characterized by relatively coarse and high-coercivity magnetic minerals are found in west of the study area (close to China, cluster 1). Clusters 2 and 3 include the samples showing a dominance of fine and low-coercivity ferrimagnetic minerals (center of the study area). Samples with close affinity to cluster 4 are found in northeast, which dominated by fine and high-coercivity minerals. Such longitudinal dichotomy in fuzzy clustering implies two possible sediment provenances: west in China and northeast in Korea. It is likely that Kuroshio Current flows are responsible for the spreads of finer grains in center of the study area (clusters 2 and 3).
Sparse subspace clustering: algorithm, theory, and applications.
Elhamifar, Ehsan; Vidal, René
2013-11-01
Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
A Cross Unequal Clustering Routing Algorithm for Sensor Network
NASA Astrophysics Data System (ADS)
Tong, Wang; Jiyi, Wu; He, Xu; Jinghua, Zhu; Munyabugingo, Charles
2013-08-01
In the routing protocol for wireless sensor network, the cluster size is generally fixed in clustering routing algorithm for wireless sensor network, which can easily lead to the "hot spot" problem. Furthermore, the majority of routing algorithms barely consider the problem of long distance communication between adjacent cluster heads that brings high energy consumption. Therefore, this paper proposes a new cross unequal clustering routing algorithm based on the EEUC algorithm. In order to solve the defects of EEUC algorithm, this algorithm calculating of competition radius takes the node's position and node's remaining energy into account to make the load of cluster heads more balanced. At the same time, cluster adjacent node is applied to transport data and reduce the energy-loss of cluster heads. Simulation experiments show that, compared with LEACH and EEUC, the proposed algorithm can effectively reduce the energy-loss of cluster heads and balance the energy consumption among all nodes in the network and improve the network lifetime
Improved Ant Colony Clustering Algorithm and Its Performance Study.
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks
NASA Astrophysics Data System (ADS)
Xiao, Xiaoli; Li, Yang
Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R.; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
A Hybrid Monkey Search Algorithm for Clustering Analysis
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Color sorting algorithm based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Zhang, BaoFeng; Huang, Qian
2009-11-01
In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Incremental Clustering Algorithm For Earth Science Data Mining
Vatsavai, Raju
2009-01-01
Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.
Clustering algorithms do not learn, but they can be learned
NASA Astrophysics Data System (ADS)
Brun, Marcel; Dougherty, Edward R.
2005-08-01
Pattern classification theory involves an error criterion, optimal classifiers, and a theory of learning. For clustering, there has historically been little theory; in particular, there has generally (but not always) been no learning. The key point is that clustering has not been grounded on a probabilistic theory. Recently, a clustering theory has been developed in the context of random sets. This paper discusses learning within that context, in particular, k- nearest-neighbor learning of clustering algorithms.
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity. PMID:26193532
A systematic comparison of genome-scale clustering algorithms
2012-01-01
Background A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further
The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis
NASA Astrophysics Data System (ADS)
Hoshen, Joseph
1997-08-01
In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine. PMID:26317743
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
A knowledge-based clustering algorithm driven by Gene Ontology.
Cheng, Jill; Cline, Melissa; Martin, John; Finkelstein, David; Awad, Tarif; Kulp, David; Siani-Rose, Michael A
2004-08-01
We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments. PMID:15468759
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
Sharma, Ashok; Podolsky, Robert; Zhao, Jieping; McIndoe, Richard A.
2009-01-01
Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting the cluster quality. Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high input/output costs involved and large distance matrices calculated, most of the algomerative clustering algorithms fail on large datasets (30 000 + genes/200 + arrays). In this article, we propose a new two-stage algorithm which partitions the high-dimensional space associated with microarray data using hyperplanes. The first stage is based on the Balanced Iterative Reducing and Clustering using Hierarchies algorithm with the second stage being a conventional k-means clustering technique. This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data. We compared the clustering results using the two-stage hyperplane algorithm with the conventional k-means algorithm from other available programs. Because, the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44 460 genes without failure and significantly decreases the time to complete when compared with popular k-means programs. The software was written in C# (.NET 1.1). Availability: The program is freely available and can be downloaded from http://www.amdcc.org/bioinformatics/bioinformatics.aspx. Contact: rmcindoe@mail.mcg.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19261720
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
Personalized PageRank Clustering: A graph clustering algorithm based on random walks
NASA Astrophysics Data System (ADS)
A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali
2013-11-01
Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Performance impact of dynamic parallelism on different clustering algorithms
NASA Astrophysics Data System (ADS)
DiMarco, Jeffrey; Taufer, Michela
2013-05-01
In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance will be measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.
Development of clustering algorithms for Compressed Baryonic Matter experiment
NASA Astrophysics Data System (ADS)
Kozlov, G. E.; Ivanov, V. V.; Lebedev, A. A.; Vassiliev, Yu. O.
2015-05-01
A clustering problem for the coordinate detectors in the Compressed Baryonic Matter (CBM) experiment is discussed. Because of the high interaction rate and huge datasets to be dealt with, clustering algorithms are required to be fast and efficient and capable of processing events with high track multiplicity. At present there are two different approaches to the problem. In the first one each fired pad bears information about its charge, while in the second one a pad can or cannot be fired, thus rendering the separation of overlapping clusters a difficult task. To deal with the latter, two different clustering algorithms were developed, integrated into the CBMROOT software environment, and tested with various types of simulated events. Both of them are found to be highly efficient and accurate.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
Fast clustering algorithm for codebook production in image vector quantization
NASA Astrophysics Data System (ADS)
Al-Otum, Hazem M.
2001-04-01
In this paper, a fast clustering algorithm (FCA) is proposed to be implemented in vector quantization codebook production. This algorithm gives the ability to avoid iterative averaging of vectors and is based on collecting vectors with similar or closely similar characters to produce corresponding clusters. FCA gives an increase in peak signal-to-noise ratio (PSNR) about 0.3 - 1.1 dB, over the LBG algorithm and reduces the computational cost for codebook production (10% - 60%) at different bit rates. Here, two FCA modifications are proposed: FCA with limited cluster size 1& (FCA-LCS1 and FCA-LCS2, respectively). FCA- LCS1 tends to subdivide large clusters into smaller ones while FCA-LCS2 reduces a predetermined threshold by a step to reach the required cluster size. The FCA-LCS1 and FCA- LCS2 give an increase in PSNR of about 0.9 - 1.0 and 0.9 - 1.1 dB, respectively, over the FCA algorithm, at the expense of about 15% - 25% and 18% - 28% increase in the output codebook size.
Particle flow reconstruction based on the directed tree clustering algorithm
Chakraborty, D.; Lima, J. G. R.; McIntosh, R.; Zutshi, V.
2006-10-27
We present the status of particle flow algorithm development at Northern Illinois University. A key element in our approach is the calorimeter-based directed tree clustering algorithm. We have attempted to identify and tackle the essential challenges and analyze the effect of several different approaches to the reconstruction of jet energies and the Z-boson mass. A number of possibilities have been studied, such as analog vs. digital energy measurement, hit density-based clustering and the use of single or multiple energy thresholds. We plan to use this PFA-based reconstruction to compare some of the proposed detector technologies and geometries.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large
A dynamic clustering algorithm in wireless sensor networks
NASA Astrophysics Data System (ADS)
Wang, Rui; Liang, Yan; Pan, Quan; Wang, Quan; Cheng, Yongmei
2005-11-01
It is essential to prolong the lifetime of wireless sensor networks (WSN) via effective cooperation of its sensor nodes. Here, a dynamic clustering algorithm, named DCA, is presented to optimally and dynamically select the micro-sensor nodes to construct a dynamic sensor cluster at each time based on the integrated performance index including information acquirement and energy consumption. In distributed target tracking with WSN, the DCA can avoid the problem of "too frequent cluster head (CH) switches", save more than 80% energy and remain almost same tracking accuracy, compared with the information-driven sensor querying (IDSQ).
A Task-parallel Clustering Algorithm for Structured AMR
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
Six clustering algorithms applied to the WAIS-R: the problem of dissimilar cluster results.
Fraboni, M; Cooper, D
1989-11-01
Clusterings of the Wechsler Adult Intelligence Scale-Revised subtests were obtained from the application of six hierarchical clustering methods (N = 113). These sets of clusters were compared for similarities using the Rand index. The calculated indices suggested similarities of cluster group membership between the Complete Linkage and Centroid methods; Complete Linkage and Ward's methods; Centroid and Ward's methods; and Single Linkage and Average Linkage Between Groups methods. Cautious use of single clustering methods is implied, though the authors suggest some advantages of knowing specific similarities and differences. If between-method comparisons consistently reveal similar cluster membership, a choice could be made from those algorithms that tend to produce similar partitions, thereby enhancing cluster interpretation. PMID:2613904
Open cluster membership probability based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
El Aziz, Mohamed Abd; Selim, I. M.; Essam, A.
2016-05-01
In the field of galaxies images, the relative coordinate positions of each star with respect to all the other stars are adapted. Therefore the membership of star cluster will be adapted by two basic criterions, one for geometric membership and other for physical (photometric) membership. So in this paper, we presented a new method for the determination of open cluster membership based on K-means clustering algorithm. This algorithm allows us to efficiently discriminate the cluster membership from the field stars. To validate the method we applied it on NGC 188 and NGC 2266, membership stars in these clusters have been obtained. The color-magnitude diagram of the membership stars is significantly clearer and shows a well-defined main sequence and a red giant branch in NGC 188, which allows us to better constrain the cluster members and estimate their physical parameters. The membership probabilities have been calculated and compared to those obtained by the other methods. The results show that the K-means clustering algorithm can effectively select probable member stars in space without any assumption about the spatial distribution of stars in cluster or field. The similarity of our results is in a good agreement with results derived by previous works.
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.
Li, Han; Li, Chun; Hu, Jie; Fan, Xiaodan
2015-01-01
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance. PMID:26671802
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Biologically supervised hierarchical clustering algorithms for gene expression data.
Boratyn, Grzegorz M; Datta, Susmita; Datta, Somnath
2006-01-01
Cluster analysis has become a standard part of gene expression analysis. In this paper, we propose a novel semi-supervised approach that offers the same flexibility as that of a hierarchical clustering. Yet it utilizes, along with the experimental gene expression data, common biological information about different genes that is being complied at various public, Web accessible databases. We argue that such an approach is inherently superior than the standard unsupervised approach of grouping genes based on expression data alone. It is shown that our biologically supervised methods produce better clustering results than the corresponding unsupervised methods as judged by the distance from the model temporal profiles. R-codes of the clustering algorithm are available from the authors upon request. PMID:17947147
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
Mapping cultivable land from satellite imagery with clustering algorithms
NASA Astrophysics Data System (ADS)
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Advanced defect detection algorithm using clustering in ultrasonic NDE
NASA Astrophysics Data System (ADS)
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
Multilayer cellular neural network and fuzzy C-mean classifiers: comparison and performance analysis
NASA Astrophysics Data System (ADS)
Trujillo San-Martin, Maite; Hlebarov, Vejen; Sadki, Mustapha
2004-11-01
Neural Networks and Fuzzy systems are considered two of the most important artificial intelligent algorithms which provide classification capabilities obtained through different learning schemas which capture knowledge and process it according to particular rule-based algorithms. These methods are especially suited to exploit the tolerance for uncertainty and vagueness in cognitive reasoning. By applying these methods with some relevant knowledge-based rules extracted using different data analysis tools, it is possible to obtain a robust classification performance for a wide range of applications. This paper will focus on non-destructive testing quality control systems, in particular, the study of metallic structures classification according to the corrosion time using a novel cellular neural network architecture, which will be explained in detail. Additionally, we will compare these results with the ones obtained using the Fuzzy C-means clustering algorithm and analyse both classifiers according to its classification capabilities.
Non-equilibrium relaxation analysis in cluster algorithms
NASA Astrophysics Data System (ADS)
Nonomura, Yoshihiko
2014-03-01
In Monte Carlo study of phase transitions, the critical slowing down has been a serious problem. In order to overcome this difficulty, two kinds of approaches have been proposed. One is the cluster algorithms, where global update scheme based on a percolation theory is introduced in order to refrain from the power-law behavior at the critical point. Another is the non-equilibrium relaxation method, where the power-law critical relaxation process is analyzed by the dynamical scaling theory in order to refrain from time-consuming equilibration. Then, the next step is to fuse these two approaches -- to investigate phase transitions with early-stage relaxation process of cluster algorithms. Since the dynamical scaling theory does not hold in cluster algorithms in principle, such attempt had been considered impossible. In the present talk we show that such fusion is actually possible using an empirical scaling form obtained from the 2D Ising models instead of the dynamical scaling theory. Applications to the q >= 3 Potts models, +/- J Ising models etc. will also be explained in the presentation.
An improved distance matrix computation algorithm for multicore clusters.
Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
Comparison of cluster expansion fitting algorithms for interactions at surfaces
NASA Astrophysics Data System (ADS)
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
ICANP2: Isoenergetic cluster algorithm for NP-complete Problems
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Fang, Chao; Katzgraber, Helmut G.
NP-complete optimization problems with Boolean variables are of fundamental importance in computer science, mathematics and physics. Most notably, the minimization of general spin-glass-like Hamiltonians remains a difficult numerical task. There has been a great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized cluster update that can be applied to different NP-complete optimization problems with Boolean variables. The cluster updates allow for a wide-spread sampling of phase space, thus speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle problems on topologies with a large site-percolation threshold. We illustrate the ICANP2 heuristic on paradigmatic optimization problems, such as the satisfiability problem and the vertex cover problem.
NIC-based Reduction Algorithms for Large-scale Clusters
Petrini, F; Moody, A T; Fernandez, J; Frachtenberg, E; Panda, D K
2004-07-30
Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale--1812 processes--NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation.
Finding reproducible cluster partitions for the k-means algorithm
2013-01-01
K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085
A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks.
Alia, Osama Moh'd
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
Mekhmoukh, Abdenour; Mokrani, Karim
2015-11-01
In this paper, a new image segmentation method based on Particle Swarm Optimization (PSO) and outlier rejection combined with level set is proposed. A traditional approach to the segmentation of Magnetic Resonance (MR) images is the Fuzzy C-Means (FCM) clustering algorithm. The membership function of this conventional algorithm is sensitive to the outlier and does not integrate the spatial information in the image. The algorithm is very sensitive to noise and in-homogeneities in the image, moreover, it depends on cluster centers initialization. To improve the outlier rejection and to reduce the noise sensitivity of conventional FCM clustering algorithm, a novel extended FCM algorithm for image segmentation is presented. In general, in the FCM algorithm the initial cluster centers are chosen randomly, with the help of PSO algorithm the clusters centers are chosen optimally. Our algorithm takes also into consideration the spatial neighborhood information. These a priori are used in the cost function to be optimized. For MR images, the resulting fuzzy clustering is used to set the initial level set contour. The results confirm the effectiveness of the proposed algorithm. PMID:26299609
jClustering, an open framework for the development of 4D clustering algorithms.
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
Dynamic Layered Dual-Cluster Heads Routing Algorithm Based on Krill Herd Optimization in UWSNs.
Jiang, Peng; Feng, Yang; Wu, Feng; Yu, Shanen; Xu, Huan
2016-01-01
Aimed at the limited energy of nodes in underwater wireless sensor networks (UWSNs) and the heavy load of cluster heads in clustering routing algorithms, this paper proposes a dynamic layered dual-cluster routing algorithm based on Krill Herd optimization in UWSNs. Cluster size is first decided by the distance between the cluster head nodes and sink node, and a dynamic layered mechanism is established to avoid the repeated selection of the same cluster head nodes. Using Krill Herd optimization algorithm selects the optimal and second optimal cluster heads, and its Lagrange model directs nodes to a high likelihood area. It ultimately realizes the functions of data collection and data transition. The simulation results show that the proposed algorithm can effectively decrease cluster energy consumption, balance the network energy consumption, and prolong the network lifetime. PMID:27589744
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Algorithm to extract the spanning clusters and calculate conductivity in strip geometries
NASA Astrophysics Data System (ADS)
Babalievski, F.
1995-06-01
I present an improved algorithm to solve the random resistor problem using a transfer-matrix technique. Preconditioning by spanning cluster extraction both reduces the size of the matrix and yields faster execution times when compared to previous algorithms.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng
2014-01-01
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Parallelization of the Wolff single-cluster algorithm.
Kaupuzs, J; Rimsāns, J; Melnik, R V N
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n> or =2. Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available. PMID:20365669
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
NASA Astrophysics Data System (ADS)
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
A heart disease recognition embedded system with fuzzy cluster algorithm.
de Carvalho, Helton Hugo; Moreno, Robson Luiz; Pimenta, Tales Cleber; Crepaldi, Paulo C; Cintra, Evaldo
2013-06-01
This article presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram - ECG signal processing by reducing the amount of data samples, without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicated common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database (EDB) as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan(®)-3A FPGA. The field programmable gate array (FPGA) implemented a Xilinx Microblaze(®) Soft-Core Processor running at a 50MHz clock rate. PMID:23394802
Clustering performance comparison using K-means and expectation maximization algorithms
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-01-01
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results. PMID:26019610
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
C-element: a new clustering algorithm to find high quality functional modules in PPI networks.
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease
Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.
2010-01-01
Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM. PMID:23777526
Ab initio study on (CO2)n clusters via electrostatics- and molecular tailoring-based algorithm
NASA Astrophysics Data System (ADS)
Jovan Jose, K. V.; Gadre, Shridhar R.
An algorithm based on molecular electrostatic potential (MESP) and molecular tailoring approach (MTA) for building energetically favorable molecular clusters is presented. This algorithm is tested on prototype (CO2)n clusters with n = 13, 20, and 25 to explore their structure, energetics, and properties. The most stable clusters in this series are seen to show more number of triangular motifs. Many-body energy decomposition analysis performed on the most stable clusters reveals that the 2-body is the major contributor (>96%) to the total interaction energy. Vibrational frequencies and molecular electrostatic potentials are also evaluated for these large clusters through MTA. The MTA-based MESPs of these clusters show a remarkably good agreement with the corresponding actual ones. The most intense MTA-based normal mode frequencies are in fair agreement with the actual ones for smaller clusters. These calculated asymmetric stretching frequencies are blue-shifted with reference to the CO2 monomer.
Chinese Text Clustering Algorithm Based k-means
NASA Astrophysics Data System (ADS)
Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A new clustering algorithm for scanning electron microscope images
NASA Astrophysics Data System (ADS)
Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad
2016-04-01
A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
Gardiner, Eleanor J; Gillet, Valerie J; Willett, Peter; Cosgrove, David A
2007-01-01
Chemical databases are routinely clustered, with the aim of grouping molecules which share similar structural features. Ideally, medicinal chemists are then able to browse a few representatives of the cluster in order to interpret the shared activity of the cluster members. However, when molecules are clustered using fingerprints, it may be difficult to decipher the structural commonalities which are present. Here, we seek to represent a cluster by means of a maximum common substructure based on the shared functionality of the cluster members. Previously, we have used reduced graphs, where each node corresponds to a generalized functional group, as topological molecular descriptors for virtual screening. In this work, we precluster a database using any clustering method. We then represent the molecules in a cluster as reduced graphs. By repeated application of a maximum common edge substructure (MCES) algorithm, we obtain one or more reduced graph cluster representatives. The sparsity of the reduced graphs means that the MCES calculations can be performed in real time. The reduced graph cluster representatives are readily interpretable in terms of functional activity and can be mapped directly back to the molecules to which they correspond, giving the chemist a rapid means of assessing potential activities contained within the cluster. Clusters of interest are then subject to a detailed R-group analysis using the same iterated MCES algorithm applied to the molecular graphs. PMID:17309248
NASA Astrophysics Data System (ADS)
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Vimalarani, C.; Subramanian, R.; Sivanandam, S. N.
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
An approximation polynomial-time algorithm for a sequence bi-clustering problem
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression
2014-01-01
Background Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification. Results We have developed a new clustering algorithm for cancer subtype identification, called “network-assisted co-clustering for the identification of cancer subtypes” (NCIS). NCIS combines gene network information to simultaneously group samples and genes into biologically meaningful clusters. Prior to clustering, we assign weights to genes based on their impact in the network. Then a new weighted co-clustering algorithm based on a semi-nonnegative matrix tri-factorization is applied. We evaluated the effectiveness of NCIS on simulated datasets as well as large-scale Breast Cancer and Glioblastoma Multiforme patient samples from The Cancer Genome Atlas (TCGA) project. NCIS was shown to better separate the patient samples into clinically distinct subtypes and achieve higher accuracy on the simulated datasets to tolerate noise, as compared to consensus hierarchical clustering. Conclusions The weighted co-clustering approach in NCIS provides a unique solution to incorporate gene network information into the clustering process. Our tool will be useful to comprehensively identify cancer subtypes that would otherwise be obscured by cancer heterogeneity, using high-throughput and high-dimensional gene expression data. PMID:24491042
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
Efficient cluster Monte Carlo algorithm for Ising spin glasses in more than two space dimensions
NASA Astrophysics Data System (ADS)
Ochoa, Andrew J.; Zhu, Zheng; Katzgraber, Helmut G.
2015-03-01
A cluster algorithm that speeds up slow dynamics in simulations of nonplanar Ising spin glasses away from criticality is urgently needed. In theory, the cluster algorithm proposed by Houdayer poses no advantage over local moves in systems with a percolation threshold below 50%, such as cubic lattices. However, we show that the frustration present in Ising spin glasses prevents the growth of system-spanning clusters at temperatures roughly below the characteristic energy scale J of the problem. Adding Houdayer cluster moves to simulations of Ising spin glasses for T ~ J produces a speedup that grows with the system size over conventional local moves. We show results for the nonplanar quasi-two-dimensional Chimera graph of the D-Wave Two quantum annealer, as well as conventional three-dimensional Ising spin glasses, where in both cases the addition of cluster moves speeds up thermalization visibly in the physically-interesting low temperature regime.
A Novel Coverage-Preserving Clustering Algorithm for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Di, Xin
Sensing coverage is one of the crucial characteristics for wireless sensor networks. It has to be considered in the design of routing protocols. LEACH (Low Energy Adaptive Cluster Hierarchy) is a significant and representative routing protocol which organizes the sensing nodes by clustering. For LEACH, residual energy should be considered in order to overcome the inequality of energy dissipation rate. Considering the impact on these two factors of a network, we have proposed a coverage-preserving energy-based clustering algorithm (CEC), which is an improved LEACH. Through improving the threshold for cluster-head selection, CEC achieved more effective results than the other baseline protocols.
NASA Astrophysics Data System (ADS)
Ball, R. C.; Lee, J. R.
1996-03-01
We prove that a new, irreversible growth algorithm, Non-Deletion Reaction-Limited Cluster-cluster Aggregation (NDRLCA), produces equilibrium Branched Polymers, expected to exhibit Lattice Animal statistics [1]. We implement NDRLCA, off-lattice, as a computer simulation for embedding dimension d=2 and 3, obtaining values for critical exponents, fractal dimension D and cluster mass distribution exponent tau: d=2, D≈ 1.53± 0.05, tau = 1.09± 0.06; d=3, D=1.96± 0.04, tau =1.50± 0.04 in good agreement with theoretical LA values. The simulation results do not support recent suggestions [2] that BPs may be in the same universality class as percolation. We also obtain values for a model-dependent critical “fugacity”, z_c and investigate the finite-size effects of our simulation, quantifying notions of “inbreeding” that occur in this algorithm. Finally we use an extension of the NDRLCA proof to show that standard Reaction-Limited Cluster-cluster Aggregation is very unlikely to be in the same universality class as Branched Polymers/Lattice Animals unless the backnone dimension for the latter is considerably less than the published value.
An Effective Intrusion Detection Algorithm Based on Improved Semi-supervised Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Li, Xueyong; Zhang, Baojian; Sun, Jiaxia; Yan, Shitao
An algorithm for intrusion detection based on improved evolutionary semi- supervised fuzzy clustering is proposed which is suited for situation that gaining labeled data is more difficulty than unlabeled data in intrusion detection systems. The algorithm requires a small number of labeled data only and a large number of unlabeled data and class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. This algorithm can deal with fuzzy label, uneasily plunges locally optima and is suited to implement on parallel architecture. Experiments show that the algorithm can improve classification accuracy and has high detection efficiency.
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
An efficient clustering algorithm for partitioning Y-short tandem repeats data
2012-01-01
Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. PMID:23039132
Clustering-based robust three-dimensional phase unwrapping algorithm.
Arevalillo-Herráez, Miguel; Burton, David R; Lalor, Michael J
2010-04-01
Relatively recent techniques that produce phase volumes have motivated the study of three-dimensional (3D) unwrapping algorithms that inherently incorporate the third dimension into the process. We propose a novel 3D unwrapping algorithm that can be considered to be a generalization of the minimum spanning tree (MST) approach. The technique combines characteristics of some of the most robust existing methods: it uses a quality map to guide the unwrapping process, a region growing mechanism to progressively unwrap the signal, and also cut surfaces to avoid error propagation. The approach has been evaluated in the context of noncontact measurement of dynamic objects, suggesting a better performance than MST-based approaches. PMID:20357860
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
The Development of FPGA-Based Pseudo-Iterative Clustering Algorithms
NASA Astrophysics Data System (ADS)
Drueke, Elizabeth; Fisher, Wade; Plucinski, Pawel
2016-03-01
The Large Hadron Collider (LHC) in Geneva, Switzerland, is set to undergo major upgrades in 2025 in the form of the High-Luminosity Large Hadron Collider (HL-LHC). In particular, several hardware upgrades are proposed to the ATLAS detector, one of the two general purpose detectors. These hardware upgrades include, but are not limited to, a new hardware-level clustering algorithm, to be performed by a field programmable gate array, or FPGA. In this study, we develop that clustering algorithm and compare the output to a Python-implemented topoclustering algorithm developed at the University of Oregon. Here, we present the agreement between the FPGA output and expected output, with particular attention to the time required by the FPGA to complete the algorithm and other limitations set by the FPGA itself.
An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm
Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng
2013-01-01
This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data. PMID:24511336
Quantum cluster algorithm for frustrated Ising models in a transverse field
NASA Astrophysics Data System (ADS)
Biswas, Sounak; Rakala, Geet; Damle, Kedar
2016-06-01
Working within the stochastic series expansion framework, we introduce and characterize a plaquette-based quantum cluster algorithm for quantum Monte Carlo simulations of transverse field Ising models with frustrated Ising exchange interactions. As a demonstration of the capabilities of this algorithm, we show that a relatively small ferromagnetic next-nearest-neighbor coupling drives the transverse field Ising antiferromagnet on the triangular lattice from an antiferromagnetic three-sublattice ordered state at low temperature to a ferrimagnetic three-sublattice ordered state.
An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information
Li, Ao; Tuck, David
2009-01-01
Motivation Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV) as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS) is introduced to automatically determine the boundary threshold. Results Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations. PMID:19838334
An algorithm for point cluster generalization based on the Voronoi diagram
NASA Astrophysics Data System (ADS)
Yan, Haowen; Weibel, Robert
2008-08-01
This paper presents an algorithm for point cluster generalization. Four types of information, i.e. statistical, thematic, topological, and metric information are considered, and measures are selected to describe corresponding types of information quantitatively in the algorithm, i.e. the number of points for statistical information, the importance value for thematic information, the Voronoi neighbors for topological information, and the distribution range and relative local density for metric information. Based on these measures, an algorithm for point cluster generalization is developed. Firstly, point clusters are triangulated and a border polygon of the point clusters is obtained. By the border polygon, some pseudo points are added to the original point clusters to form a new point set and a range polygon that encloses all original points is constructed. Secondly, the Voronoi polygons of the new point set are computed in order to obtain the so-called relative local density of each point. Further, the selection probability of each point is computed using its relative local density and importance value, and then mark those will-be-deleted points as 'deleted' according to their selection probabilities and Voronoi neighboring relations. Thirdly, if the number of retained points does not satisfy that computed by the Radical Law, physically delete the points marked as 'deleted' forming a new point set, and the second step is repeated; else physically deleted pseudo points and the points marked as 'deleted', and the generalized point clusters are achieved. Owing to the use of the Voronoi diagram the algorithm is parameter free and fully automatic. As our experiments show, it can be used in the generalization of point features arranged in clusters such as thematic dot maps and control points on cartographic maps.
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Borodovsky, M; Peresetsky, A
1994-09-01
Non-homogeneous Markov chain models can represent biologically important regions of DNA sequences. The statistical pattern that is described by these models is usually weak and was found primarily because of strong biological indications. The general method for extracting similar patterns is presented in the current paper. The algorithm incorporates cluster analysis, multiple alignment and entropy minimization. The method was first tested using the set of DNA sequences produced by Markov chain generators. It was shown that artificial gene sequences, which initially have been randomly set up along the multiple alignment panels, are aligned according to the hidden triplet phase. Then the method was applied to real protein-coding sequences and the resulting alignment clearly indicated the triplet phase and produced the parameters of the optimal 3-periodic non-homogeneous Markov chain model. These Markov models were already employed in the GeneMark gene prediction algorithm, which is used in genome sequencing projects. The algorithm can also handle the case in which the sequences to be aligned reveal different statistical patterns, such as Escherichia coli protein-coding sequences belonging to Class II and Class III. The algorithm accepts a random mix of sequences from different classes, and is able to separate them into two groups (clusters), align each cluster separately, and define a non-homogeneous Markov chain model for each sequence cluster. PMID:7952897
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
Solving the depth of the repeated texture areas based on the clustering algorithm
NASA Astrophysics Data System (ADS)
Xiong, Zhang; Zhang, Jun; Tian, Jinwen
2015-12-01
The reconstruction of the 3D scene in the monocular stereo vision needs to get the depth of the field scenic points in the picture scene. But there will inevitably be error matching in the process of image matching, especially when there are a large number of repeat texture areas in the images, there will be lots of error matches. At present, multiple baseline stereo imaging algorithm is commonly used to eliminate matching error for repeated texture areas. This algorithm can eliminate the ambiguity correspond to common repetition texture. But this algorithm has restrictions on the baseline, and has low speed. In this paper, we put forward an algorithm of calculating the depth of the matching points in the repeat texture areas based on the clustering algorithm. Firstly, we adopt Gauss Filter to preprocess the images. Secondly, we segment the repeated texture regions in the images into image blocks by using spectral clustering segmentation algorithm based on super pixel and tag the image blocks. Then, match the two images and solve the depth of the image. Finally, the depth of the image blocks takes the median in all depth values of calculating point in the bock. So the depth of repeated texture areas is got. The results of a lot of image experiments show that the effect of our algorithm for calculating the depth of repeated texture areas is very good.
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION
WANG, QINGGUO; SHANG, CHARLES; XU, DONG
2014-01-01
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION.
Wang, Qingguo; Shang, Charles; Xu, Dong; Shang, Yi
2013-10-25
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Clustering of Resting State Networks
Lee, Megan H.; Hacker, Carl D.; Snyder, Abraham Z.; Corbetta, Maurizio; Zhang, Dongyang; Leuthardt, Eric C.; Shimony, Joshua S.
2012-01-01
Background The goal of the study was to demonstrate a hierarchical structure of resting state activity in the healthy brain using a data-driven clustering algorithm. Methodology/Principal Findings The fuzzy-c-means clustering algorithm was applied to resting state fMRI data in cortical and subcortical gray matter from two groups acquired separately, one of 17 healthy individuals and the second of 21 healthy individuals. Different numbers of clusters and different starting conditions were used. A cluster dispersion measure determined the optimal numbers of clusters. An inner product metric provided a measure of similarity between different clusters. The two cluster result found the task-negative and task-positive systems. The cluster dispersion measure was minimized with seven and eleven clusters. Each of the clusters in the seven and eleven cluster result was associated with either the task-negative or task-positive system. Applying the algorithm to find seven clusters recovered previously described resting state networks, including the default mode network, frontoparietal control network, ventral and dorsal attention networks, somatomotor, visual, and language networks. The language and ventral attention networks had significant subcortical involvement. This parcellation was consistently found in a large majority of algorithm runs under different conditions and was robust to different methods of initialization. Conclusions/Significance The clustering of resting state activity using different optimal numbers of clusters identified resting state networks comparable to previously obtained results. This work reinforces the observation that resting state networks are hierarchically organized. PMID:22792291
An X-Ray Spectral Classification Algorithm with Application to Young Stellar Clusters
NASA Astrophysics Data System (ADS)
Hojnacki, S. M.; Kastner, J. H.; Micela, G.; Feigelson, E. D.; LaLonde, S. M.
2007-04-01
A large volume of low signal-to-noise, multidimensional data is available from the CCD imaging spectrometers aboard the Chandra X-Ray Observatory and the X-Ray Multimirror Mission (XMM-Newton). To make progress analyzing this data, it is essential to develop methods to sort, classify, and characterize the vast library of X-ray spectra in a nonparametric fashion (complementary to current parametric model fits). We have developed a spectral classification algorithm that handles large volumes of data and operates independently of the requirement of spectral model fits. We use proven multivariate statistical techniques including principal component analysis and an ensemble classifier consisting of agglomerative hierarchical clustering and K-means clustering applied for the first time for spectral classification. The algorithm positions the sources in a multidimensional spectral sequence and then groups the ordered sources into clusters based on their spectra. These clusters appear more distinct for sources with harder observed spectra. The apparent diversity of source spectra is reduced to a three-dimensional locus in principal component space, with spectral outliers falling outside this locus. The algorithm was applied to a sample of 444 strong sources selected from the 1616 X-ray emitting sources detected in deep Chandra imaging spectroscopy of the Orion Nebula Cluster. Classes form sequences in NH, AV, and accretion activity indicators, demonstrating that the algorithm efficiently sorts the X-ray sources into a physically meaningful sequence. The algorithm also isolates important classes of very deeply embedded, active young stellar objects, and yields trends between X-ray spectral parameters and stellar parameters for the lowest mass, pre-main-sequence stars.
The RedGOLD cluster detection algorithm and its cluster candidate catalogue for the CFHT-LS W1
NASA Astrophysics Data System (ADS)
Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik
2016-01-01
We present RedGOLD (Red-sequence Galaxy Overdensity cLuster Detector), a new optical/NIR galaxy cluster detection algorithm, and apply it to the CFHT-LS W1 field. RedGOLD searches for red-sequence galaxy overdensities while minimizing contamination from dusty star-forming galaxies. It imposes an Navarro-Frenk-White profile and calculates cluster detection significance and richness. We optimize these latter two parameters using both simulations and X-ray-detected cluster catalogues, and obtain a catalogue ˜80 per cent pure up to z ˜ 1, and ˜100 per cent (˜70 per cent) complete at z ≤ 0.6 (z ≲ 1) for galaxy clusters with M ≳ 1014 M⊙ at the CFHT-LS Wide depth. In the CFHT-LS W1, we detect 11 cluster candidates per deg2 out to z ˜ 1.1. When we optimize both completeness and purity, RedGOLD obtains a cluster catalogue with higher completeness and purity than other public catalogues, obtained using CFHT-LS W1 observations, for M ≳ 1014 M⊙. We use X-ray-detected cluster samples to extend the study of the X-ray temperature-optical richness relation to a lower mass threshold, and find a mass scatter at fixed richness of σlnM|λ = 0.39 ± 0.07 and σlnM|λ = 0.30 ± 0.13 for the Gozaliasl et al. and Mehrtens et al. samples. When considering similar mass ranges as previous work, we recover a smaller scatter in mass at fixed richness. We recover 93 per cent of the redMaPPer detections, and find that its richness estimates is on average ˜40-50 per cent larger than ours at z > 0.3. RedGOLD recovers X-ray cluster spectroscopic redshifts at better than 5 per cent up to z ˜ 1, and the centres within a few tens of arcseconds.
FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
Yang, Jing; Chen, Limin; Zhang, Jianpei
2015-01-01
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification. PMID:26090857
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
K-Boost: a scalable algorithm for high-quality clustering of microarray gene expression data.
Geraci, Filippo; Leoncini, Mauro; Montangero, Manuela; Pellegrini, Marco; Renda, M Elena
2009-06-01
Microarray technology for profiling gene expression levels is a popular tool in modern biological research. Applications range from tissue classification to the detection of metabolic networks, from drug discovery to time-critical personalized medicine. Given the increase in size and complexity of the data sets produced, their analysis is becoming problematic in terms of time/quality trade-offs. Clustering genes with similar expression profiles is a key initial step for subsequent manipulations and the increasing volumes of data to be analyzed requires methods that are at the same time efficient (completing an analysis in minutes rather than hours) and effective (identifying significant clusters with high biological correlations). In this paper, we propose K-Boost, a clustering algorithm based on a combination of the furthest-point-first (FPF) heuristic for solving the metric k-center problem, a stability-based method for determining the number of clusters, and a k-means-like cluster refinement. K-Boost runs in O (|N| x k) time, where N is the input matrix and k is the number of proposed clusters. Experiments show that this low complexity is usually coupled with a very good quality of the computed clusterings, which we measure using both internal and external criteria. Supporting data can be found as online Supplementary Material at www.liebertonline.com. PMID:19522668
NASA Astrophysics Data System (ADS)
Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego
2011-12-01
A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
A genetic algorithmic approach to antenna null-steering using a cluster computer.
NASA Astrophysics Data System (ADS)
Recine, Greg; Cui, Hong-Liang
2001-06-01
We apply a genetic algorithm (GA) to the problem of electronically steering the maximums and nulls of an antenna array to desired positions (null toward enemy listener/jammer, max toward friendly listener/transmitter). The antenna pattern itself is computed using NEC2 which is called by the main GA program. Since a GA naturally lends itself to parallelization, this simulation was applied to our new twin 64-node cluster computers (Gemini). Design issues and uses of the Gemini cluster in our group are also discussed.
KD-tree based clustering algorithm for fast face recognition on large-scale data
NASA Astrophysics Data System (ADS)
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
K-Means Re-Clustering-Algorithmic Options with Quantifiable Performance Comparisons
Meyer, A W; Paglieroni, D; Asteneh, C
2002-12-17
This paper presents various architectural options for implementing a K-Means Re-Clustering algorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.
Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng
2015-10-01
A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Robustness of ‘cut and splice’ genetic algorithms in the structural optimization of atomic clusters
NASA Astrophysics Data System (ADS)
Froltsov, Vladimir A.; Reuter, Karsten
2009-05-01
We return to the geometry optimization problem of Lennard-Jones clusters to analyze the performance dependence of 'cut and splice' genetic algorithms (GAs) on the employed population size. We generally find that admixing twinning mutation moves leads to an improved robustness of the algorithm efficiency with respect to this a priori unknown technical parameter. The resulting very stable performance of the corresponding mutation + mating GA implementation over a wide range of population sizes is an important feature when addressing unknown systems with computationally involved first-principles based GA sampling.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Evaluation of particle clustering algorithms in the prediction of brownout dust clouds
NASA Astrophysics Data System (ADS)
Govindarajan, Bharath Madapusi
2011-07-01
A study of three Lagrangian particle clustering methods has been conducted with application to the problem of predicting brownout dust clouds that develop when rotorcraft land over surfaces covered with loose sediment. A significant impediment in performing such particle modeling simulations is the extremely large number of particles needed to obtain dust clouds of acceptable fidelity. Computing the motion of each and every individual sediment particle in a dust cloud (which can reach into tens of billions per cubic meter) is computationally prohibitive. The reported work involved the development of computationally efficient clustering algorithms that can be applied to the simulation of dilute gas-particle suspensions at low Reynolds numbers of the relative particle motion. The Gaussian distribution, k-means and Osiptsov's clustering methods were studied in detail to highlight the nuances of each method for a prototypical flow field that mimics the highly unsteady, two-phase vortical particle flow obtained when rotorcraft encounter brownout conditions. It is shown that although clustering algorithms can be problem dependent and have bounds of applicability, they offer the potential to significantly reduce computational costs while retaining the overall accuracy of a brownout dust cloud solution.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Dynamic scheduling study on engineering machinery of clusters using multi-agent system ant algorithm
NASA Astrophysics Data System (ADS)
Gao, Qiang; Wang, Hongli; Guo, Long; Xiang, Jianping
2005-12-01
In the process of road surface construction, dispatchers' scheduling was experiential and blindfold in some degree and static scheduling restricted the continuity of the construction. Serious problems such as labor holdup, material awaiting and scheduling delay could occur when the old scheduling technique was used. This paper presents ant colony algorithm based on MAS that has the abilities of intelligentized modeling and dynamic scheduling. MAS model deals with single agent's communication and corresponding in engineering machinery of clusters firstly, next we apply ant colony algorithm to solve dynamic scheduling in the plant. Ant colony algorithm can optimize the match of agents and make the system dynamic balance. The effectiveness of the proposed method is demonstrated with MATLAB simulations.
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
Nonclercq, Antoine; Foulon, Martine; Verheulpen, Denis; De Cock, Cathy; Buzatu, Marga; Mathys, Pierre; Van Bogaert, Patrick
2012-09-30
Visual quantification of interictal epileptiform activity is time consuming and requires a high level of expert's vigilance. This is especially true for overnight recordings of patient suffering from epileptic encephalopathy with continuous spike and waves during slow-wave sleep (CSWS) as they can show tens of thousands of spikes. Automatic spike detection would be attractive for this condition, but available algorithms have methodological limitations related to variation in spike morphology both between patients and within a single recording. We propose a fully automated method of interictal spike detection that adapts to interpatient and intrapatient variation in spike morphology. The algorithm works in five steps. (1) Spikes are detected using parameters suitable for highly sensitive detection. (2) Detected spikes are separated into clusters. (3) The number of clusters is automatically adjusted. (4) Centroids are used as templates for more specific spike detections, therefore adapting to the types of spike morphology. (5) Detected spikes are summed. The algorithm was evaluated on EEG samples from 20 children suffering from epilepsy with CSWS. When compared to the manual scoring of 3 EEG experts (3 records), the algorithm demonstrated similar performance since sensitivity and selectivity were 0.3% higher and 0.4% lower, respectively. The algorithm showed little difference compared to the manual scoring of another expert for the spike-and-wave index evaluation in 17 additional records (the mean absolute difference was 3.8%). This algorithm is therefore efficient for the count of interictal spikes and determination of a spike-and-wave index. PMID:22850558
A contour-line color layer separation algorithm based on fuzzy clustering and region growing
NASA Astrophysics Data System (ADS)
Liu, Tiange; Miao, Qiguang; Xu, Pengfei; Tong, Yubing; Song, Jianfeng; Xia, Ge; Yang, Yun; Zhai, Xiaojie
2016-03-01
The color layers of contour-lines separated from scanned topographic map are the basis of contour-line extraction, but it is difficult to separate them well due to the color aliasing and mixed color problems. This paper will focus us on contour-line color layer separation and presents a novel approach for it based on fuzzy clustering and Single-prototype Region Growing for Contour-line Layer (SRGCL). The purpose of this paper is to provide a solution for processing scanned topographic maps on which contour-lines are abundant and densely distributed, for example, in the condition similar to hilly areas and mountainous regions, the contour-lines always occupy the largest proportion in linear features and the contour-line separation is the most difficult task. The proposed approach includes steps as follows. First step, line features are extracted from the map to reduce the interference from area features in fuzzy clustering. Second step, fuzzy clustering algorithm is employed to obtain membership matrix of pixels in the line map. Third step, based on the membership matrix, we obtain the most-similar prototype and the second-similar prototype of each pixel as the indicators of the pixel in SRGCL. The spatial relationship and the fuzzy similarity of color features are used in SRGCL to overcome the inaccurate classification of ambiguous pixels. The procedure focusing on single contour-line layer will improve the accuracy of contour-line segmentation result of SRGCL relative to general segmentation methods. We verified the algorithm on several USGS historical maps, the experimental results show that our algorithm produces contour-line color layers with good continuity and few noises, which verifies the improvement in contour-line color layer separation of our algorithm relative to two general segmentation methods.
Effects of algorithm for diagnosis of active labour: cluster randomised trial
Hundley, Vanora; Dowding, Dawn; Bland, J Martin; McNamee, Paul; Greer, Ian; Styles, Maggie; Barnett, Carol A; Scotland, Graham; Niven, Catherine
2008-01-01
Objective To compare the effectiveness of an algorithm for diagnosis of active labour in primiparous women with standard care in terms of maternal and neonatal outcomes. Design Cluster randomised trial. Setting Maternity units in Scotland with at least 800 annual births. Participants 4503 women giving birth for the first time, in 14 maternity units. Seven experimental clusters collected data from a baseline sample of 1029 women and a post-implementation sample of 896 women. The seven control clusters had a baseline sample of 1291 women and a post-implementation sample of 1287 women. Intervention Use of an algorithm by midwives to assist their diagnosis of active labour, compared with standard care. Main outcomes Primary outcome: use of oxytocin for augmentation of labour. Secondary outcomes: medical interventions in labour, admission management, and birth outcome. Results No significant difference was found between groups in percentage use of oxytocin for augmentation of labour (experimental minus control, difference=0.3, 95% confidence interval −9.2 to 9.8; P=0.9) or in the use of medical interventions in labour. Women in the algorithm group were more likely to be discharged from the labour suite after their first labour assessment (difference=−19.2, −29.9 to −8.6; P=0.002) and to have more pre-labour admissions (0.29, 0.04 to 0.55; P=0.03). Conclusions Use of an algorithm to assist midwives with the diagnosis of active labour in primiparous women did not result in a reduction in oxytocin use or in medical intervention in spontaneous labour. Significantly more women in the experimental group were discharged home after their first labour ward assessment. Trial registration Current Controlled Trials ISRCTN00522952. PMID:19064606
Fuzzy Document Clustering Approach using WordNet Lexical Categories
NASA Astrophysics Data System (ADS)
Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.
Crowded Cluster Cores: An Algorithm for Deblending in Dark Energy Survey Images
NASA Astrophysics Data System (ADS)
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-11-01
Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. This paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Detection and clustering of features in aerial images by neuron network-based algorithm
NASA Astrophysics Data System (ADS)
Vozenilek, Vit
2015-12-01
The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less
QoE collaborative evaluation method based on fuzzy clustering heuristic algorithm.
Bao, Ying; Lei, Weimin; Zhang, Wei; Zhan, Yuzhuo
2016-01-01
At present, to realize or improve the quality of experience (QoE) is a major goal for network media transmission service, and QoE evaluation is the basis for adjusting the transmission control mechanism. Therefore, a kind of QoE collaborative evaluation method based on fuzzy clustering heuristic algorithm is proposed in this paper, which is concentrated on service score calculation at the server side. The server side collects network transmission quality of service (QoS) parameter, node location data, and user expectation value from client feedback information. Then it manages the historical data in database through the "big data" process mode, and predicts user score according to heuristic rules. On this basis, it completes fuzzy clustering analysis, and generates service QoE score and management message, which will be finally fed back to clients. Besides, this paper mainly discussed service evaluation generative rules, heuristic evaluation rules and fuzzy clustering analysis methods, and presents service-based QoE evaluation processes. The simulation experiments have verified the effectiveness of QoE collaborative evaluation method based on fuzzy clustering heuristic rules. PMID:27398281
Applications of colored petri net and genetic algorithms to cluster tool scheduling
NASA Astrophysics Data System (ADS)
Liu, Tung-Kuan; Kuo, Chih-Jen; Hsiao, Yung-Chin; Tsai, Jinn-Tsong; Chou, Jyh-Horng
2005-12-01
In this paper, we propose a method, which uses Coloured Petri Net (CPN) and genetic algorithm (GA) to obtain an optimal deadlock-free schedule and to solve re-entrant problem for the flexible process of the cluster tool. The process of the cluster tool for producing a wafer usually can be classified into three types: 1) sequential process, 2) parallel process, and 3) sequential parallel process. But these processes are not economical enough to produce a variety of wafers in small volume. Therefore, this paper will propose the flexible process where the operations of fabricating wafers are randomly arranged to achieve the best utilization of the cluster tool. However, the flexible process may have deadlock and re-entrant problems which can be detected by CPN. On the other hand, GAs have been applied to find the optimal schedule for many types of manufacturing processes. Therefore, we successfully integrate CPN and GAs to obtain an optimal schedule with the deadlock and re-entrant problems for the flexible process of the cluster tool.
Applying Social Networking and Clustering Algorithms to Galaxy Groups in ALFALFA
NASA Astrophysics Data System (ADS)
Bramson, Ali; Wilcots, E. M.
2012-01-01
Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clustering algorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.
Burst detection in district metering areas using a data driven clustering algorithm.
Wu, Yipeng; Liu, Shuming; Wu, Xue; Liu, Youfei; Guan, Yisheng
2016-09-01
This paper describes a novel methodology for burst detection in a water distribution system. The proposed method has two stages. In the first stage, a clustering algorithm was employed for outlier detection, while the second stage identified the presence of bursts. An important feature of this method is that data analysis is carried out dependent on multiple flow meters whose measurements vary simultaneously in a district metering area (DMA). Moreover, the clustering-based method can automatically cope with non-stationary conditions in historical data; namely, the method has no prior data selection process. An example application of this method has been implemented to confirm that relatively large bursts (simulated by flushing) with short duration can be detected effectively. Noticeably, the method has a low false positive rate compared with previous studies and appearance of detected abnormal water usage consists with weather changes, showing great promise in real application to multi-inlet and multi-outlet DMAs. PMID:27176651
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Reduced-cost sparsity-exploiting algorithm for solving coupled-cluster equations.
Brabec, Jiri; Yang, Chao; Epifanovsky, Evgeny; Krylov, Anna I; Ng, Esmond
2016-05-01
We present an algorithm for reducing the computational work involved in coupled-cluster (CC) calculations by sparsifying the amplitude correction within a CC amplitude update procedure. We provide a theoretical justification for this approach, which is based on the convergence theory of inexact Newton iterations. We demonstrate by numerical examples that, in the simplest case of the CCD equations, we can sparsify the amplitude correction by setting, on average, roughly 90% nonzero elements to zeros without a major effect on the convergence of the inexact Newton iterations. PMID:26804120
Multispectral image classification of MRI data using an empirically-derived clustering algorithm
Horn, K.M.; Osbourn, G.C.; Bouchard, A.M.; Sanders, J.A. |
1998-08-01
Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm
NASA Astrophysics Data System (ADS)
Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel
2014-05-01
Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Farah, Ihsen; Nguyen, Thi Nguyet Que; Groh, Audrey; Guenot, Dominique; Jeannesson, Pierre; Gobinet, Cyril
2016-05-23
The coupling between Fourier-transform infrared (FTIR) imaging and unsupervised classification is effective in revealing the different structures of human tissues based on their specific biomolecular IR signatures; thus the spectral histology of the studied samples is achieved. However, the most widely applied clustering methods in spectral histology are local search algorithms, which converge to a local optimum, depending on initialization. Multiple runs of the techniques estimate multiple different solutions. Here, we propose a memetic algorithm, based on a genetic algorithm and a k-means clustering refinement, to perform optimal clustering. In addition, this approach was applied to the acquired FTIR images of normal human colon tissues originating from five patients. The results show the efficiency of the proposed memetic algorithm to achieve the optimal spectral histology of these samples, contrary to k-means. PMID:27110605
Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.
Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C
2014-06-01
KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition. PMID:23912505
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
Analysis Clustering of Electricity Usage Profile Using K-Means Algorithm
NASA Astrophysics Data System (ADS)
Amri, Yasirli; Lailatul Fadhilah, Amanda; Fatmawati; Setiani, Novi; Rani, Septia
2016-01-01
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in specified time is important because the cost of storing electricity is expensive. For handling this problem, we need knowledge about the electricity usage pattern of clients. This pattern can be obtained by using clustering techniques. In this paper, clustering is used to obtain the similarity of electricity usage patterns in a specified time. We use K-Means algorithm to employ clustering on the dataset of electricity consumption from 370 clients that collected in a year. Result of this study, we obtained an interesting pattern that there is a big group of clients consume the lowest electric load in spring season, but in another group, the lowest electricity consumption occurred in winter season. From this result, electricity provider can make production planning in specified season based on pattern of electricity usage profile.
Clustering by Fuzzy Neural Gas and Evaluation of Fuzzy Clusters
Geweniger, Tina; Fischer, Lydia; Kaden, Marika; Lange, Mandy; Villmann, Thomas
2013-01-01
We consider some modifications of the neural gas algorithm. First, fuzzy assignments as known from fuzzy c-means and neighborhood cooperativeness as known from self-organizing maps and neural gas are combined to obtain a basic Fuzzy Neural Gas. Further, a kernel variant and a simulated annealing approach are derived. Finally, we introduce a fuzzy extension of the ConnIndex to obtain an evaluation measure for clusterings based on fuzzy vector quantization. PMID:24396342
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen
2010-01-01
The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
Falcon: neural fuzzy control and decision systems using FKP and PFKP clustering algorithms.
Tung, W L; Quek, C
2004-02-01
Neural fuzzy networks proposed in the literature can be broadly classified into two groups. The first group is essentially fuzzy systems with self-tuning capabilities and requires an initial rule base to be specified prior to training. The second group of neural fuzzy networks, on the other hand, is able to automatically formulate the fuzzy rules from the numerical training data. Examples are the Falcon-ART, and the POPFNN family of networks. A cluster analysis is first performed on the training data and the fuzzy rules are subsequently derived through the proper connections of these computed clusters. This correspondence proposes two new networks: Falcon-FKP and Falcon-PFKP. They are extensions of the Falcon-ART network, and aimed to overcome the shortcomings faced by the Falcon-ART network itself, i.e., poor classification ability when the classes of input data are very similar to each other, termination of training cycle depends heavily on a preset error parameter, the fuzzy rule base of the Falcon-ART network may not be consistent Nauck, there is no control over the number of fuzzy rules generated, and learning efficiency may deteriorate by using complementarily coded training data. These deficiencies are essentially inherent to the fuzzy ART, clustering technique employed by the Falcon-ART network. Hence, two clustering techniques--Fuzzy Kohonen Partitioning (FKP) and its pseudo variant PFKP, are synthesized with the basic Falcon structure to compute the fuzzy sets and to automatically derive the fuzzy rules from the training data. The resultant neural fuzzy networks are Falcon-FKP and Falcon-PFKP, respectively. These two proposed networks have a lean and efficient training algorithm and consistent fuzzy rule bases. Extensive simulations are conducted using the two networks and their performances are encouraging when benchmarked against other neural and neural fuzzy systems. PMID:15369109
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Ahmad, Amir
2016-01-01
The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data. PMID:27022504
OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm
NASA Astrophysics Data System (ADS)
Komura, Yukihiro
2015-12-01
We present sample OpenACC programs of the Swendsen-Wang multi-cluster spin flip algorithm. OpenACC is a directive-based programming model for accelerators without requiring modification to the underlying CPU code itself. In this paper, we deal with the classical spin models as with the sample CUDA programs (Komura and Okabe, 2014), that is, two-dimensional (2D) Ising model, three-dimensional (3D) Ising model, 2D Potts model, 3D Potts model, 2D XY model and 3D XY model. We explain the details of sample OpenACC programs and compare the performance of the present OpenACC implementations with that of the CUDA implementations for the 2D and 3D Ising models and the 2D and 3D XY models.
NASA Astrophysics Data System (ADS)
He, Tao; Sun, Yu-Jun; Xu, Ji-De; Wang, Xue-Jun; Hu, Chang-Ru
2014-01-01
Land use/cover (LUC) classification plays an important role in remote sensing and land change science. Because of the complexity of ground covers, LUC classification is still regarded as a difficult task. This study proposed a fusion algorithm, which uses support vector machines (SVM) and fuzzy k-means (FKM) clustering algorithms. The main scheme was divided into two steps. First, a clustering map was obtained from the original remote sensing image using FKM; simultaneously, a normalized difference vegetation index layer was extracted from the original image. Then, the classification map was generated by using an SVM classifier. Three different classification algorithms were compared, tested, and verified-parametric (maximum likelihood), nonparametric (SVM), and hybrid (unsupervised-supervised, fusion of SVM and FKM) classifiers, respectively. The proposed algorithm obtained the highest overall accuracy in our experiments.
Kanters, René P F; Donald, Kelling J
2014-12-01
A new flexible implementation of a genetic algorithm for locating unique low energy minima of isomers of clusters is described and tested. The strategy employed can be applied to molecular or atomic clusters and has a flexible input structure so that a system with several different elements can be built up from a set of individual atoms or from fragments made up of groups of atoms. This cluster program is tested on several systems, and the results are compared to computational and experimental data from previous studies. The quality of the algorithm for locating reliably the most competitive low energy structures of an assembly of atoms is examined for strongly bound Si-Li clusters, and ZnF2 clusters, and the more weakly interacting water trimers. The use of the nuclear repulsion energy as a duplication criterion, an increasing population size, and avoiding mutation steps without loss of efficacy are distinguishing features of the program. For the Si-Li clusters, a few new low energy minima are identified in the testing of the algorithm, and our results for the metal fluorides and water show very good agreement with the literature. PMID:26583254
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
NASA Astrophysics Data System (ADS)
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
Silva, Mateus X; Galvão, Breno R L; Belchior, Jadson C
2014-05-21
Genetic algorithm is employed to survey an empirical potential energy surface for small Na(x)K(y) clusters with x + y ≤ 15, providing initial conditions for electronic structure methods. The minima of such empirical potential are assessed and corrected using high level ab initio methods such as CCSD(T), CR-CCSD(T)-L and MP2, and benchmark results are obtained for specific cases. The results are the first calculations for such small alloy clusters and may serve as a reference for further studies. The validity and choice of a proper functional and basis set for DFT calculations are then explored using the benchmark data, where it was found that the usual DFT approach may fail to provide the correct qualitative result for specific systems. The best general agreement to the benchmark calculations is achieved with def2-TZVPP basis set with SVWN5 functional, although the LANL2DZ basis set (with effective core potential) and SVWN5 functional provided the most cost-effective results. PMID:24691391
NASA Astrophysics Data System (ADS)
Ashton, Douglas J.; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B.
2010-11-01
Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical cluster algorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs.
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
NASA Astrophysics Data System (ADS)
Guo, Yue; Wang, Liang-Liang; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Qun, Ou-Yang; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao
2016-01-01
Considering the effects of aging on the existing Inner Drift Chamber (IDC) of BESIII, a GEM-based inner tracker, the Cylindrical-GEM Inner Tracker (CGEM-IT), is proposed to be designed and constructed as an upgrade candidate for the IDC. This paper introduces a full simulation package for the CGEM-IT with a simplified digitization model, and describes the development of software for cluster reconstruction and track fitting, using a track fitting algorithm based on the Kalman filter method. Preliminary results for the reconstruction algorithms which are obtained using a Monte Carlo sample of single muon events in the CGEM-IT, show that the CGEM-IT has comparable momentum resolution and transverse vertex resolution to the IDC, and a better z-direction resolution than the IDC. Supported by National Key Basic Research Program of China (2015CB856700), National Natural Science Foundation of China (11205184, 11205182) and Joint Funds of National Natural Science Foundation of China (U1232201)
NASA Astrophysics Data System (ADS)
Ma, Xiaoke; Gao, Lin
2011-05-01
The detection of community structure in complex networks is crucial since it provides insight into the substructures of the whole network. Spectral clustering algorithms that employ the eigenvalues and eigenvectors of an appropriate input matrix have been successfully applied in this field. Despite its empirical success in community detection, spectral clustering has been criticized for its inefficiency when dealing with large scale data sets. This is confirmed by the fact that the time complexity for spectral clustering is cubic with respect to the number of instances; even the memory efficient iterative eigensolvers, such as the power method, may converge slowly to the desired solutions. In efforts to improve the complexity and performance, many non-traditional spectral clustering algorithms have been proposed. Rather than using the real eigenvalues and eigenvectors as in the traditional methods, the non-traditional clusterings employ additional topological structure information characterized by the spectrum of a matrix associated with the network involved, such as the complex eigenvalues and their corresponding complex eigenvectors, eigenspaces and semi-supervised labels. However, to the best of our knowledge, no work has been devoted to comparison among these newly developed approaches. This is the main goal of this paper, through evaluating the effectiveness of these spectral algorithms against some benchmark networks. The experimental results demonstrate that the spectral algorithm based on the eigenspaces achieves the best performance but is the slowest algorithm; the semi-supervised spectral algorithm is the fastest but its performance largely depends on the prior knowledge; and the spectral method based on the complement network shows similar performance to the conventional ones.
Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means.
Kar, Sudeshna Sil; Maity, Santi P
2016-03-01
This paper proposes an automatic blood vessel extraction method on retinal images using matched filtering in an integrated system design platform that involves curvelet transform and kernel based fuzzy c-means. Since curvelet transform represents the lines, the edges and the curvatures very well and in compact form (by less number of coefficients) compared to other multi-resolution techniques, this paper uses curvelet transform for enhancement of the retinal vasculature. Matched filtering is then used to intensify the blood vessels' response which is further employed by kernel based fuzzy c-means algorithm that extracts the vessel silhouette from the background through non-linear mapping. For pathological images, in addition to matched filtering, Laplacian of Gaussian filter is also employed to distinguish the step and the ramp like signal from that of vessel structure. To test the efficacy of the proposed method, the algorithm has also been applied to images in presence of additive white Gaussian noise where the curvelet transform has been used for image denoising. Performance is evaluated on publicly available DRIVE, STARE and DIARETDB1 databases and is compared with the large number of existing blood vessel extraction methodologies. Simulation results demonstrate that the proposed method is very much efficient in detecting the long and the thick as well as the short and the thin vessels with an average accuracy of 96.16% for the DRIVE and 97.35% for the STARE database wherein the existing methods fail to extract the tiny and the thin vessels. PMID:26848729
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi
2014-09-01
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. PMID:24596179
MixSim : An R Package for Simulating Data to Study Performance of Clustering Algorithms
Melnykov, Volodymyr; Chen, Wei-Chen; Maitra, Ranjan
2012-01-01
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Wang, Xueyi
2011-01-01
The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces. PMID:22247818
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
Learning of discriminant hyperplanes in imperfectly supervised or unsupervised training sample sets with unreliably labeled samples along the fuzzy joint boundaries between sample clusters is discussed, with the discriminant hyperplane designed to be a least-squares fit to the unreliably labeled data points. (Samples along the fuzzy boundary jump back and forth from one cluster to the other in recursive cluster stabilization and are considered unreliably labeled.) Minimization of the distances of these unreliably labeled samples from the hyperplanes does not sacrifice the ability to discriminate between classes represented by reliably labeled subsets of samples. An equivalent unconstrained linear inequality problem is formulated and algorithms for its solution are indicated. Landsat earth sensing data were used in confirming the validity and computational feasibility of the approach, which should be useful in deriving discriminant hyperplanes separating clusters with fuzzy boundaries, given supervised training sample sets with unreliably labeled boundary samples.
An Improved Source-Scanning Algorithm for Locating Earthquake Clusters or Aftershock Sequences
NASA Astrophysics Data System (ADS)
Liao, Y.; Kao, H.; Hsu, S.
2010-12-01
The Source-scanning Algorithm (SSA) was originally introduced in 2004 to locate non-volcanic tremors. Its application was later expanded to the identification of earthquake rupture planes and the near-real-time detection and monitoring of landslides and mud/debris flows. In this study, we further improve SSA for the purpose of locating earthquake clusters or aftershock sequences when only a limited number of waveform observations are available. The main improvements include the application of a ground motion analyzer to separate P and S waves, the automatic determination of resolution based on the grid size and time step of the scanning process, and a modified brightness function to utilize constraints from multiple phases. Specifically, the improved SSA (named as ISSA) addresses two major issues related to locating earthquake clusters/aftershocks. The first one is the massive amount of both time and labour to locate a large number of seismic events manually. And the second one is to efficiently and correctly identify the same phase across the entire recording array when multiple events occur closely in time and space. To test the robustness of ISSA, we generate synthetic waveforms consisting of 3 separated events such that individual P and S phases arrive at different stations in different order, thus making correct phase picking nearly impossible. Using these very complicated waveforms as the input, the ISSA scans all model space for possible combination of time and location for the existence of seismic sources. The scanning results successfully associate various phases from each event at all stations, and correctly recover the input. To further demonstrate the advantage of ISSA, we apply it to the waveform data collected by a temporary OBS array for the aftershock sequence of an offshore earthquake southwest of Taiwan. The overall signal-to-noise ratio is inadequate for locating small events; and the precise arrival times of P and S phases are difficult to
Segmentation of SoHO/EIT Images using fuzzy clustering algorithms
NASA Astrophysics Data System (ADS)
Delouille, V.; Barra, V.; Hochedez, J.
2007-12-01
The study of the variability of the solar corona and the monitoring of its traditional regions (Coronal Holes, Quiet Sun and Active Regions) are of great importance in astrophysics as well as in view of the Space Weather and Space Climate applications. In this presentation, I will propose a multi-channel unsupervised spatially- constrained fuzzy clustering algorithm that automatically segments EUV solar images into Coronal Holes, Quiet Sun and Active Regions. The use of Fuzzy logic allows to manage the various noises present in the images and the imprecision in the definition of the above mentioned regions. The process is fast and automatic. It is applied to SoHO-EIT images taken from January 1997 till May 2005, spanning thus almost a full solar cycle. Results in terms of areas and intensity estimations are consistent with previous knowledge. The method reveal the rotational and other mid-term periodicities in the extracted time series across solar cycle 23. Further, such an approach paves the way to bridging observations between spatially resolved data from imaging telescopes and time series from radiometers. Time series resulting form the segmentation of EUV coronal images can indeed provide an essential component in the process of reconstructing the solar spectrum.
NASA Astrophysics Data System (ADS)
Huang, Zhipeng; Gao, Lihong; Wang, Yangwei; Wang, Fuchi
2016-06-01
The Johnson-Cook (J-C) constitutive model is widely used in the finite element simulation, as this model shows the relationship between stress and strain in a simple way. In this paper, a cluster global optimization algorithm is proposed to determine the J-C constitutive model parameters of materials. A set of assumed parameters is used for the accuracy verification of the procedure. The parameters of two materials (401 steel and 823 steel) are determined. Results show that the procedure is reliable and effective. The relative error between the optimized and assumed parameters is no more than 4.02%, and the relative error between the optimized and assumed stress is 0.2% × 10-5. The J-C constitutive parameters can be determined more precisely and quickly than the traditional manual procedure. Furthermore, all the parameters can be simultaneously determined using several curves under different experimental conditions. A strategy is also proposed to accurately determine the constitutive parameters.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
NASA Astrophysics Data System (ADS)
Chang, Seongmin; Baek, Sungmin; Kim, Ki-Ook; Cho, Maenghyo
2015-06-01
A system identification method has been proposed to validate finite element models of complex structures using measured modal data. Finite element method is used for the system identification as well as the structural analysis. In perturbation methods, the perturbed system is expressed as a combination of the baseline structure and the related perturbations. The changes in dynamic responses are applied to determine the structural modifications so that the equilibrium may be satisfied in the perturbed system. In practical applications, the dynamic measurements are carried out on a limited number of accessible nodes and associated degrees of freedom. The equilibrium equation is, in principle, expressed in terms of the measured (master, primary) and unmeasured (slave, secondary) degrees of freedom. Only the specified degrees of freedom are included in the equation formulation for identification and the unspecified degrees of freedom are eliminated through the iterative improved reduction scheme. A large number of system parameters are included as the unknown variables in the system identification of large-scaled structures. The identification problem with large number of system parameters requires a large amount of computation time and resources. In the present study, a hierarchical clustering algorithm is applied to reduce the number of system parameters effectively. Numerical examples demonstrate that the proposed method greatly improves the accuracy and efficiency in the inverse problem of identification.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules. PMID:27258041
Cloud classification from satellite data using a fuzzy sets algorithm - A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1989-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
NASA Astrophysics Data System (ADS)
Kauczor, Joanna; Norman, Patrick; Christiansen, Ove; Coriani, Sonia
2013-12-01
We present a reduced-space algorithm for solving the complex (damped) linear response equations required to compute the complex linear response function for the hierarchy of methods: coupled cluster singles, coupled cluster singles and iterative approximate doubles, and coupled cluster singles and doubles. The solver is the keystone element for the development of damped coupled cluster response methods for linear and nonlinear effects in resonant frequency regions.
An Algorithm for Testing Unidimensionality and Clustering Items in Rasch Measurement
ERIC Educational Resources Information Center
Debelak, Rudolf; Arendasy, Martin
2012-01-01
A new approach to identify item clusters fitting the Rasch model is described and evaluated using simulated and real data. The proposed method is based on hierarchical cluster analysis and constructs clusters of items that show a good fit to the Rasch model. It thus gives an estimate of the number of independent scales satisfying the postulates of…
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X.
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
Sumithra, Subramaniam; Victoire, T. Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
Sumithra, Subramaniam; Victoire, T Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
NASA Astrophysics Data System (ADS)
Chaudhury, Pinaki; Bhattacharyya, S. P.
1999-03-01
It is demonstrated that Genetic Algorithm in a floating point realisation can be a viable tool for locating critical points on a multi-dimensional potential energy surface (PES). For small clusters, the standard algorithm works well. For bigger ones, the search for global minimum becomes more efficient when used in conjunction with coordinate stretching, and partitioning of the strings into a core part and an outer part which are alternately optimized The method works with equal facility for locating minima, local as well as global, and saddle points (SP) of arbitrary orders. The search for minima requires computation of the gradient vector, but not the Hessian, while that for SP's requires the information of the gradient vector and the Hessian, the latter only at some specific points on the path. The method proposed is tested on (i) a model 2-d PES (ii) argon clusters (Ar 4-Ar 30) in which argon atoms interact via Lennard-Jones potential, (iii) Ar mX, m=12 clusters where X may be a neutral atom or a cation. We also explore if the method could also be used to construct what may be called a stochastic representation of the reaction path on a given PES with reference to conformational changes in Ar n clusters.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Malek, H.
1978-01-01
A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to
Li, Weizhong [San Diego Supercomputer Center
2013-01-22
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Li, Weizhong
2011-10-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithm (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.
Algorithmic Identification for Wings in Butterfly Diagrams.
NASA Astrophysics Data System (ADS)
Illarionov, E. A.; Sokolov, D. D.
2012-12-01
We investigate to what extent the wings of solar butterfly diagrams can be separated without an explicit usage of Hale's polarity law as well as the location of the solar equator. Two algorithms of cluster analysis, namely DBSCAN and C-means, have demonstrated their ability to separate the wings of contemporary butterfly diagrams based on the sunspot group density in the diagram only. Here we generalize the method for continuous tracers, give results concerning the migration velocities and presented clusters for 12 - 20 cycles.
Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.
Matsumoto, Shinya; Aisaki, Ken-ichi; Kanno, Jun
2005-01-01
The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30,000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays. PMID:16901101
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method. PMID:26767800
NASA Astrophysics Data System (ADS)
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-01
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Yin, Jiandong; Yang, Jiawen; Guo, Qiyong
2014-01-01
During dynamic susceptibility contrast-magnetic resonance imaging (DSC-MRI), it has been demonstrated that the arterial input function (AIF) can be obtained using fuzzy c-means (FCM) and k-means clustering methods. However, due to the dependence on the initial centers of clusters, both clustering methods have poor reproducibility between the calculation and recalculation steps. To address this problem, the present study developed an alternative clustering technique based on the agglomerative hierarchy (AH) method for AIF determination. The performance of AH method was evaluated using simulated data and clinical data based on comparisons with the two previously demonstrated clustering-based methods in terms of the detection accuracy, calculation reproducibility, and computational complexity. The statistical analysis demonstrated that, at the cost of a significantly longer execution time, AH method obtained AIFs more in line with the expected AIF, and it was perfectly reproducible at different time points. In our opinion, the disadvantage of AH method in terms of the execution time can be alleviated by introducing a professional high-performance workstation. The findings of this study support the feasibility of using AH clustering method for detecting the AIF automatically. PMID:24932638
Information bottleneck based incremental fuzzy clustering for large biomedical data.
Liu, Yongli; Wan, Xing
2016-08-01
Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy. PMID:27260783
KmL3D: a non-parametric algorithm for clustering joint trajectories.
Genolini, C; Pingault, J B; Driss, T; Côté, S; Tremblay, R E; Vitaro, F; Arnaud, C; Falissard, B
2013-01-01
In cohort studies, variables are measured repeatedly and can be considered as trajectories. A classic way to work with trajectories is to cluster them in order to detect the existence of homogeneous patterns of evolution. Since cohort studies usually measure a large number of variables, it might be interesting to study the joint evolution of several variables (also called joint-variable trajectories). To date, the only way to cluster joint-trajectories is to cluster each trajectory independently, then to cross the partitions obtained. This approach is unsatisfactory because it does not take into account a possible co-evolution of variable-trajectories. KmL3D is an R package that implements a version of k-means dedicated to clustering joint-trajectories. It provides facilities for the management of missing values, offers several quality criteria and its graphic interface helps the user to select the best partition. KmL3D can work with any number of joint-variable trajectories. In the restricted case of two joint trajectories, it proposes 3D tools to visualize the partitioning and then export 3D dynamic rotating-graphs to PDF format. PMID:23127283
NASA Astrophysics Data System (ADS)
Chen, Wen-Yuan; Liu, Chen-Chung
2006-01-01
The problems with binary watermarking schemes are that they have only a small amount of embeddable space and are not robust enough. We develop a slice-based large-cluster algorithm (SBLCA) to construct a robust watermarking scheme for binary images. In SBLCA, a small-amount cluster selection (SACS) strategy is used to search for a feasible slice in a large-cluster flappable-pixel decision (LCFPD) method, which is used to search for the best location for concealing a secret bit from a selected slice. This method has four major advantages over the others: (a) SBLCA has a simple and effective decision function to select appropriate concealment locations, (b) SBLCA utilizes a blind watermarking scheme without the original image in the watermark extracting process, (c) SBLCA uses slice-based shuffling capability to transfer the regular image into a hash state without remembering the state before shuffling, and finally, (d) SBLCA has enough embeddable space that every 64 pixels could accommodate a secret bit of the binary image. Furthermore, empirical results on test images reveal that our approach is a robust watermarking scheme for binary images.
Colorectal Cancer Staging Using Three Clustering Methods Based on Preoperative Clinical Findings.
Pourahmad, Saeedeh; Pourhashemi, Soudabeh; Mohammadianpanah, Mohammad
2016-01-01
Determination of the colorectal cancer stage is possible only after surgery based on pathology results. However, sometimes this may prove impossible. The aim of the present study was to determine colorectal cancer stage using three clustering methods based on preoperative clinical findings. All patients referred to the Colorectal Research Center of Shiraz University of Medical Sciences for colorectal cancer surgery during 2006 to 2014 were enrolled in the study. Accordingly, 117 cases participated. Three clustering algorithms were utilized including k-means, hierarchical and fuzzy c-means clustering methods. External validity measures such as sensitivity, specificity and accuracy were used for evaluation of the methods. The results revealed maximum accuracy and sensitivity values for the hierarchical and a maximum specificity value for the fuzzy c-means clustering methods. Furthermore, according to the internal validity measures for the present data set, the optimal number of clusters was two (silhouette coefficient) and the fuzzy c-means algorithm was more appropriate than the k-means clustering approach by increasing the number of clusters. PMID:26925686
Solution of facility location problem in Turkey by using fuzzy C-means method
NASA Astrophysics Data System (ADS)
Kocakaya, Mustafa Nabi; Türkakın, Osman Hürol
2013-10-01
Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
NASA Astrophysics Data System (ADS)
Inclan, Eric; Lassester, Jack; Geohegan, David; Yoon, Mina
Research in TiO2 materials is highly relevant to energy and device applications, however, precise control of their morphologies and characterization are still a grand challenge in the field. We developed and applied a hybrid optimization algorithm to explore configuration spaces of energetically metastable TiO2. Our approach was to minimize the total energy of TiO2 lusters in order to identify the energy landscape of plausible (TiO2)n (n = 1-100). The hybrid algorithm retained good agreement with a regression on structures published in literature up to n = 25. Using first-principles density functional theory, we analyze basic properties of the hybrid-algorithm generated TiO2 nanoparticles. Our results show the expected convergence to bulk material characteristics as the cluster size increases in that the band gap varies with respect to the size of the nanocluster. The nanoclusters trended toward compact, low surface area structures that share characteristics of the bulk, namely octahedral microstructures as the nanoclusters increased in size. Our study helps in better identifying and characterizing experimentally observed structures. This work is supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2016-06-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
A reliable and energy-efficient wave routing algorithm based on cluster for sensor networks
NASA Astrophysics Data System (ADS)
Wu, Fei; Tan, Zhihu; Chen, Yu; Xie, Changsheng
2008-11-01
In the recent past, wireless sensor networks have found their way into a wide variety of applications and systems with vastly varying requirements and characteristics. The major limit to the sensor nodes is the limit of energy and computing capability. The major consumption of the nodes energy in sensor networks occurs in the communication between nodes or between the nodes and the base station. Many current protocols only pay attention to how to reduce the energy consumption, rather than the stability during the routing process. Aimed to solve this problem, the paper produces a new kind of reliable and energy-efficient wave routing algorithm. The results of the experimentation can indicate that this new algorithm can not only reduce the energy consumption, but also ensure the stability of the routing.
Delineation and quantitation of brain lesions by fuzzy clustering in positron emission tomography.
Boudraa, A E; Champier, J; Cinotti, L; Bordet, J C; Lavenne, F; Mallet, J J
1996-01-01
In this study, we investigate the application of the fuzzy clustering to the anatomical localization and quantitation of brain lesions in Positron Emission Tomography (PET) images. The method is based on the Fuzzy C-Means (FCM) algorithm. The algorithm segments the PET image data points into a given number of clusters. Each cluster is an homogeneous region of the brain (e.g. tumor). A feature vector is assigned to a cluster which has the highest membership degree. Having the label affected by the FCM algorithm to a cluster, one may easily compute the corresponding spatial localization, area and perimeter. Studies concerning the evolution of a tumor after different treatments in two patients are presented. PMID:8891420
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
A new time dependent density functional algorithm for large systems and plasmons in metal clusters
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-14
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H{sub 2} to [Au{sub 147}]{sup −}). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented.
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
Koryachko, Alexandr; Matthiadis, Anna; Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M; Ducoste, Joel J; Tuck, James; Long, Terri A; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
A novel algorithm for detecting multiple covariance and clustering of biological sequences.
Shen, Wei; Li, Yan
2016-01-01
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc. PMID:27451921
A novel algorithm for detecting multiple covariance and clustering of biological sequences
Shen, Wei; Li, Yan
2016-01-01
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc. PMID:27451921
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
NASA Astrophysics Data System (ADS)
Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
Polat, Kemal
2012-08-01
In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets. PMID:21611787
An automatic image inpainting algorithm based on FCM.
Liu, Jiansheng; Liu, Hui; Qiao, Shangping; Yue, Guangxue
2014-01-01
There are many existing image inpainting algorithms in which the repaired area should be manually determined by users. Aiming at this drawback of the traditional image inpainting algorithms, this paper proposes an automatic image inpainting algorithm which automatically identifies the repaired area by fuzzy C-mean (FCM) algorithm. FCM algorithm classifies the image pixels into a number of categories according to the similarity principle, making the similar pixels clustering into the same category as possible. According to the provided gray value of the pixels to be inpainted, we calculate the category whose distance is the nearest to the inpainting area and this category is to be inpainting area, and then the inpainting area is restored by the TV model to realize image automatic inpainting. PMID:24516358
A graph-based watershed merging using fuzzy C-means and simulated annealing for image segmentation
NASA Astrophysics Data System (ADS)
Vadiveloo, Mogana; Abdullah, Rosni; Rajeswari, Mandava
2015-12-01
In this paper, we have addressed the issue of over-segmented regions produced in watershed by merging the regions using global feature. The global feature information is obtained from clustering the image in its feature space using Fuzzy C-Means (FCM) clustering. The over-segmented regions produced by performing watershed on the gradient of the image are then mapped to this global information in the feature space. Further to this, the global feature information is optimized using Simulated Annealing (SA). The optimal global feature information is used to derive the similarity criterion to merge the over-segmented watershed regions which are represented by the region adjacency graph (RAG). The proposed method has been tested on digital brain phantom simulated dataset to segment white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) soft tissues regions. The experiments showed that the proposed method performs statistically better, with average of 95.242% regions are merged, than the immersion watershed and average accuracy improvement of 8.850% in comparison with RAG-based immersion watershed merging using global and local features.
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-01-01
Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA
Winter, André; Rieger, Heiko; Vojta, Matthias; Bulla, Ralf
2009-01-23
A continuous time cluster algorithm for two-level systems coupled to a dissipative bosonic bath is presented and applied to the sub-Ohmic spin-boson model. When the power s of the spectral function Jomega proportional, variant omegas is smaller than 1/2, the critical exponents are found to be classical, mean-field like. Potential sources for the discrepancy with recent renormalization group predictions are traced back to the effect of a dangerously irrelevant variable. PMID:19257337
NASA Technical Reports Server (NTRS)
Werth, L. F. (Principal Investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
Ghosh, Anupam; Dhara, Bibhas Chandra; De, Rajat K
2013-04-01
In this article, we compare the performance of 19 cluster validity indices, in identifying some possible genes mediating certain cancers, based on gene expression data. For the purpose of this comparison, we have developed a method. The proposed method involves cluster generation, selection of the best k-value or c-values, cluster identification, identifying the altered gene cluster, scoring an altered gene cluster and determining the best k-value or c-value exploring through biological repositories. The effectiveness of the method has been demonstrated on three gene expression data sets dealing with human lung cancer, colon cancer, and leukemia. Here, we have used three clustering algorithms, i.e., k-means, PAM and fuzzy c-means. We have used biochemical pathways related to these cancers and p-value statistics for validating the study. PMID:27481591
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
Ganguly Neogi, Soumya; Chaudhury, Pinaki
2014-01-01
In this article, we propose a stochastic search-based method, namely genetic algorithm (GA) and simulated annealing (SA) in conjunction with density functional theory (DFT) to evaluate global and local minimum structures of (TiO2)n clusters with n = 1-12. Once the structures are established, we evaluate the infrared spectroscopic modes, cluster formation energy, vertical excitation energy, vertical ionization potential, vertical electron affinity, highest occupied molecular orbital (HOMO)-lowest unoccupied molecular orbital (LUMO) gaps, and so forth. We show that an initial determination of structure using stochastic techniques (GA/SA), also popularly known as natural algorithms as their working principle mimics certain natural processes, and following it up with density functional calculations lead to high-quality structures for these systems. We have shown that the clusters tend to form three-dimensional networks. We compare our results with the available experimental and theoretical results. The results obtained from SA/GA-DFT technique agree well with available theoretical and experimental data of literature. PMID:24272539
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-01-01
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's paired t-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation of r = 0
Luo, Jiawei; Wu, Juan
2015-01-01
Essential proteins provide valuable information for the development of biology and medical research from the system level. The accuracy of topological centrality only based methods is deeply affected by noise in the network. Therefore, exploring efficient methods for identifying essential proteins would be of great value. Using biological features to identify essential proteins is efficient in reducing the noise in PPI network. In this paper, based on the consideration that essential proteins evolve slowly and play a central role within a network, a new algorithm, named CED, is proposed. CED mainly employs gene expression level, protein complex information and edge clustering coefficient to predict essential proteins. The performance of CED is validated based on the yeast Protein-Protein Interaction (PPI) network obtained from DIP database and BioGRID database. The prediction accuracy of CED outperforms other seven algorithms when applied to the two databases. PMID:26510286
Not Available
1994-02-02
This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of this cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
NASA Astrophysics Data System (ADS)
Myszkiewicz, Grzegorz; Meerts, W. Leo; Ratzer, Christian; Schmitt, Michael
2005-07-01
The structure of 4-methylphenol (p-cresol) and its binary water cluster have been elucidated by rotationally resolved laser-induced fluorescence spectroscopy. The electronic origins of the monomer and the cluster are split into four sub-bands by the internal rotation of the methyl group and of the hydroxy group in case of the monomer, and the water moiety in case of the cluster. From the rotational constants of the monomer the structure in the S1 state could be determined to be distorted quinoidally. The structure of the p-cresol-water cluster is determined to be trans linear, with a O-O hydrogen bond length of 290pm in the electronic ground state and of 285pm in the electronically excited state. The S1-state lifetime of p-cresol, p-cresol-d1, and the binary water cluster have been determined to be 1.6, 9.7, and 3.8ns, respectively.
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-05-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems. PMID:24677621
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177
Srivastava, Subodh; Sharma, Neeraj; Singh, S K; Srivastava, R
2014-07-01
In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996
Baldauf, Tobias; Smith, Robert E.; Seljak, Uros; Mandelbaum, Rachel
2010-03-15
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Muster: Massively Scalable Clustering
2010-05-20
Muster is a framework for scalable cluster analysis. It includes implementations of classic K-Medoids partitioning algorithms, as well as infrastructure for making these algorithms run scalably on very large systems. In particular, Muster contains algorithms such as CAPEK (described in reference 1) that are capable of clustering highly distributed data sets in-place on a hundred thousand or more processes.
Krejci, Adam; Hupp, Ted R.; Lexa, Matej; Vojtesek, Borivoj; Muller, Petr
2016-01-01
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. Contact: muller@mou.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342231
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781
Bio Inspired Swarm Algorithm for Tumor Detection in Digital Mammogram
NASA Astrophysics Data System (ADS)
Dheeba, J.; Selvi, Tamil
Microcalcification clusters in mammograms is the significant early sign of breast cancer. Individual clusters are difficult to detect and hence an automatic computer aided mechanism will help the radiologist in detecting the microcalcification clusters in an easy and efficient way. This paper presents a new classification approach for detection of microcalcification in digital mammogram using particle swarm optimization algorithm (PSO) based clustering technique. Fuzzy C-means clustering technique, well defined for clustering data sets are used in combination with the PSO. We adopt the particle swarm optimization to search the cluster center in the arbitrary data set automatically. PSO can search the best solution from the probability option of the Social-only model and Cognition-only model. This method is quite simple and valid, and it can avoid the minimum local value. The proposed classification approach is applied to a database of 322 dense mammographic images, originating from the MIAS database. Results shows that the proposed PSO-FCM approach gives better detection performance compared to conventional approaches.
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Ghorbanzadeh, Leila; Torshabi, Ahmad Esmaili; Nabipour, Jamshid Soltani; Arbatan, Moslem Ahmadi
2016-04-01
In image guided radiotherapy, in order to reach a prescribed uniform dose in dynamic tumors at thorax region while minimizing the amount of additional dose received by the surrounding healthy tissues, tumor motion must be tracked in real-time. Several correlation models have been proposed in recent years to provide tumor position information as a function of time in radiotherapy with external surrogates. However, developing an accurate correlation model is still a challenge. In this study, we proposed an adaptive neuro-fuzzy based correlation model that employs several data clustering algorithms for antecedent parameters construction to avoid over-fitting and to achieve an appropriate performance in tumor motion tracking compared with the conventional models. To begin, a comparative assessment is done between seven nuero-fuzzy correlation models each constructed using a unique data clustering algorithm. Then, each of the constructed models are combined within an adaptive sevenfold synthetic model since our tumor motion database has high degrees of variability and that each model has its intrinsic properties at motion tracking. In the proposed sevenfold synthetic model, best model is selected adaptively at pre-treatment. The model also updates the steps for each patient using an automatic model selectivity subroutine. We tested the efficacy of the proposed synthetic model on twenty patients (divided equally into two control and worst groups) treated with CyberKnife synchrony system. Compared to Cyberknife model, the proposed synthetic model resulted in 61.2% and 49.3% reduction in tumor tracking error in worst and control group, respectively. These results suggest that the proposed model selection program in our synthetic neuro-fuzzy model can significantly reduce tumor tracking errors. Numerical assessments confirmed that the proposed synthetic model is able to track tumor motion in real time with high accuracy during treatment. PMID:25765021
Naganawa, Mika; Kimura, Yuichi; Yano, Junichi; Mishina, Masahiro; Yanagisawa, Masao; Ishii, Kenji; Oda, Keiichi; Ishiwata, Kiichi
2008-03-01
The Logan plot is a powerful algorithm used to generate binding-potential images from dynamic positron emission tomography (PET) images in neuroreceptor studies. However, it requires arterial blood sampling and metabolite correction to provide an input function, and clinically it is preferable that this need for arterial blood sampling be obviated. Estimation of the input function with metabolite correction using an intersectional searching algorithm (ISA) has been proposed. The ISA seeks the input function from the intersection between the planes spanned by measured radioactivity curves in tissue and their cumulative integrals in data space. However, the ISA is sensitive to noise included in measured curves, and it often fails to estimate the input function. In this paper, we propose a robust estimation of the cumulative integral of the plasma time-activity curve (pTAC) using ISA (robust EPISA) to overcome noise issues. The EPISA reduces noise in the measured PET data using averaging and clustering that gathers radioactivity curves with similar kinetic parameters. We confirmed that a little noise made the estimation of the input function extremely difficult in the simulation. The robust EPISA was validated by application to eight real dynamic [(11)C]TMSX PET data sets used to visualize adenosine A(2A) receptors and four real dynamic [(11)C]PIB PET data sets used to visualize amyloid-beta plaque. Peripherally, the latter showed faster metabolism than the former. The clustering operation improved the signal-to-noise ratio for the PET data sufficiently to estimate the input function, and the calculated neuroreceptor images had a quality equivalent to that using measured pTACs after metabolite correction. Our proposed method noninvasively yields an alternative input function for Logan plots, allowing the Logan plot to be more useful in neuroreceptor studies. PMID:18187345
Discovering fuzzy clusters in databases using an evolutionary approach
NASA Astrophysics Data System (ADS)
Chung, Lewis L. H.; Chan, Keith C. C.; Leung, Henry
2000-04-01
In this paper, we present a fuzzy clustering technique for relational database for data mining task. Clustering task for data mining application can be performed more effective if the technique is able to handle both continuous- and discrete- valued data commonly found in real-life relational databases. However, many of fuzzy clustering techniques such as fuzzy c- means are developed only for continuous-valued data due to their distance measure defined in the Euclidean space. When attributes are also characterized by discrete-valued attribute, they are unable to perform their task. Besides, how to deal with fuzzy input data in addition to mixed continuous and discrete is not clearly discussed. Instead of using a distance measure for defining similarity between records, we propose a technique based on a genetic algorithm (GA). By representing a specific grouping of records in a chromosome and using an objective measure as a fitness measure to determine if such grouping is meaningful and interesting, our technique is able to handle continuous, discrete, and even fuzzy input data. Unlike many of the existing clustering techniques, which can only produce the result of grouping with no interpretation, our proposed algorithm is able to generate a set of rules describing the interestingness of the discovered clusters. This feature, in turn, eases the understandability of the discovered result.
Kidney segmentation in CT sequences using SKFCM and improved GrowCut algorithm
2015-01-01
Background Organ segmentation is an important step in computer-aided diagnosis and pathology detection. Accurate kidney segmentation in abdominal computed tomography (CT) sequences is an essential and crucial task for surgical planning and navigation in kidney tumor ablation. However, kidney segmentation in CT is a substantially challenging work because the intensity values of kidney parenchyma are similar to those of adjacent structures. Results In this paper, a coarse-to-fine method was applied to segment kidney from CT images, which consists two stages including rough segmentation and refined segmentation. The rough segmentation is based on a kernel fuzzy C-means algorithm with spatial information (SKFCM) algorithm and the refined segmentation is implemented with improved GrowCut (IGC) algorithm. The SKFCM algorithm introduces a kernel function and spatial constraint into fuzzy c-means clustering (FCM) algorithm. The IGC algorithm makes good use of the continuity of CT sequences in space which can automatically generate the seed labels and improve the efficiency of segmentation. The experimental results performed on the whole dataset of abdominal CT images have shown that the proposed method is accurate and efficient. The method provides a sensitivity of 95.46% with specificity of 99.82% and performs better than other related methods. Conclusions Our method achieves high accuracy in kidney segmentation and considerably reduces the time and labor required for contour delineation. In addition, the method can be expanded to 3D segmentation directly without modification. PMID:26356850
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
NASA Astrophysics Data System (ADS)
Bellugi, Dino; Milledge, David G.; Dietrich, William E.; Perron, J. Taylor; McKean, Jim
2015-12-01
Predicting shallow landslide size and location across landscapes is important for understanding landscape form and evolution and for hazard identification. We test a recently developed model that couples a search algorithm with 3-D slope stability analysis that predicts these two key attributes in an intensively studied landscape with a 10 year landslide inventory. We use process-based submodels to estimate soil depth, root strength, and pore pressure for a sequence of landslide-triggering rainstorms. We parameterize submodels with field measurements independently of the slope stability model, without calibrating predictions to observations. The model generally reproduces observed landslide size and location distributions, overlaps 65% of observed landslides, and of these predicts size to within factors of 2 and 1.5 in 55% and 28% of cases, respectively. Five percent of the landscape is predicted unstable, compared to 2% recorded landslide area. Missed landslides are not due to the search algorithm but to the formulation and parameterization of the slope stability model and inaccuracy of observed landslide maps. Our model does not improve location prediction relative to infinite-slope methods but predicts landslide size, improves process representation, and reduces reliance on effective parameters. Increasing rainfall intensity or root cohesion generally increases landslide size and shifts locations down hollow axes, while increasing cohesion restricts unstable locations to areas with deepest soils. Our findings suggest that shallow landslide abundance, location, and size are ultimately controlled by covarying topographic, material, and hydrologic properties. Estimating the spatiotemporal patterns of root strength, pore pressure, and soil depth across a landscape may be the greatest remaining challenge.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…
Matlab Cluster Ensemble Toolbox
Sapio, Vincent De; Kegelmeyer, Philip
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences
Winters-Hilt, Stephen; Merat, Sam
2007-01-01
Background Support Vector Machines (SVMs) provide a powerful method for classification (supervised learning). Use of SVMs for clustering (unsupervised learning) is now being considered in a number of different ways. Results An SVM-based clustering algorithm is introduced that clusters data with no a priori knowledge of input classes. The algorithm initializes by first running a binary SVM classifier against a data set with each vector in the set randomly labelled, this is repeated until an initial convergence occurs. Once this initialization step is complete, the SVM confidence parameters for classification on each of the training instances can be accessed. The lowest confidence data (e.g., the worst of the mislabelled data) then has its' labels switched to the other class label. The SVM is then re-run on the data set (with partly re-labelled data) and is guaranteed to converge in this situation since it converged previously, and now it has fewer data points to carry with mislabelling penalties. This approach appears to limit exposure to the local minima traps that can occur with other approaches. Thus, the algorithm then improves on its weakly convergent result by SVM re-training after each re-labeling on the worst of the misclassified vectors – i.e., those feature vectors with confidence factor values beyond some threshold. The repetition of the above process improves the accuracy, here a measure of separability, until there are no misclassifications. Variations on this type of clustering approach are shown. Conclusion Non-parametric SVM-based clustering methods may allow for much improved performance over parametric approaches, particularly if they can be designed to inherit the strengths of their supervised SVM counterparts. PMID:18047717
NASA Astrophysics Data System (ADS)
Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza
2014-01-01
Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (μ-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…
Haplotyping Problem, A Clustering Approach
Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi
2007-09-06
Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms.
NASA Astrophysics Data System (ADS)
Wang, Hesheng; Feyes, Denise; Mulvihill, John; Oleinick, Nancy; MacLennan, Gregory; Fei, Baowei
2007-03-01
We are investigating in vivo small animal imaging and analysis methods for the assessment of photodynamic therapy (PDT), an emerging therapeutic modality for cancer treatment. Multiple weighted MR images were acquired from tumor-bearing mice pre- and post-PDT and 24-hour after PDT. We developed an automatic image classification method to differentiate live, necrotic and intermediate tissues within the treated tumor on the MR images. We used a multiscale diffusion filter to process the MR images before classification. A multiscale fuzzy C-means (FCM) classification method was applied along the scales. The object function of the standard FCM was modified to allow multiscale classification processing where the result from a coarse scale is used to supervise the classification in the next scale. The multiscale fuzzy C-means (MFCM) method takes noise levels and partial volume effects into the classification processing. The method was validated by simulated MR images with various noise levels. For simulated data, the classification method achieved 96.0 +/- 1.1% overlap ratio. For real mouse MR images, the classification results of the treated tumors were validated by histologic images. The overlap ratios were 85.6 +/- 5.1%, 82.4 +/- 7.8% and 80.5 +/- 10.2% for the live, necrotic, and intermediate tissues, respectively. The MR imaging and the MFCM classification methods may provide a useful tool for the assessment of the tumor response to photodynamic therapy in vivo.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
Weigend, Florian
2014-10-01
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm. PMID:25296780
NASA Astrophysics Data System (ADS)
Weigend, Florian
2014-10-01
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7]4-. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4−}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the “pure” genetic algorithm.
Relation chain based clustering analysis
NASA Astrophysics Data System (ADS)
Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo
2011-08-01
Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In
A diabetic retinopathy detection method using an improved pillar K-means algorithm.
Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa
2014-01-01
The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage. PMID:24516323
A diabetic retinopathy detection method using an improved pillar K-means algorithm
Gogula, Susmitha valli; Divakar, CH; Satyanarayana, CH; Rao, Allam Appa
2014-01-01
The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage. PMID:24516323
Unconventional methods for clustering
NASA Astrophysics Data System (ADS)
Kotyrba, Martin
2016-06-01
Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Brightest Cluster Galaxy Identification
NASA Astrophysics Data System (ADS)
Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team
2011-01-01
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method. PMID:25248211
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
Slonim, Noam; Atwal, Gurinder Singh; Tkačik, Gašper; Bialek, William
2005-01-01
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
Khormali, Aminollah; Addeh, Jalil
2016-07-01
Unnatural patterns in the control charts can be associated with a specific set of assignable causes for process variation. Hence, pattern recognition is very useful in identifying the process problems. In this study, a multiclass SVM (SVM) based classifier is proposed because of the promising generalization capability of support vector machines. In the proposed method type-2 fuzzy c-means (T2FCM) clustering algorithm is used to make a SVM system more effective. The fuzzy support vector machine classifier suggested in this paper is composed of three main sub-networks: fuzzy classifier sub-network, SVM sub-network and optimization sub-network. In SVM training, the hyper-parameters plays a very important role in its recognition accuracy. Therefore, cuckoo optimization algorithm (COA) is proposed for selecting appropriate parameters of the classifier. Simulation results showed that the proposed system has very high recognition accuracy. PMID:27101724
NASA Astrophysics Data System (ADS)
Liu, Hanli; Pei, Tao; Zhou, Chenghu; Zhu, A.-Xing
2008-12-01
In order to enhance the spectral characteristics of features for clustering, in the experiment of wetland extraction in Sanjiang Plain, we use a series of approaches in preprocessing of the MODIS remote sensing data by considering eliminating interference caused by other features. First, by analysis of the spectral characteristics of data, we choose a set of multi-temporal and multi-spectral MODIS data in Sanjiang Plain for clustering. By building and applying mask, the water areas and woodland vegetation can be eliminated from the image data. Second, by Enhanced Lee filtering and Minimum Noise Fraction (MNF) transformation, the data can be denoised and the characteristics of wetland can be enhanced obviously. After the preprocessing of data, the fuzzy c-means clustering algorithm optimized by particle swarm algorithm (PSO-FCM) is utilized on the image data for the wetland extraction. The result of experiment shows that the accuracy of wetland extraction by means of PSO-FCM algorithm is reasonable and effective.
Matlab Cluster Ensemble Toolbox
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. Withmore » regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.« less
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping
NASA Astrophysics Data System (ADS)
Tamiminia, H.; Homayouni, S.; Safari, A.
2015-12-01
Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks. PMID:26357321
Join-Graph Propagation Algorithms
Mateescu, Robert; Kask, Kalev; Gogate, Vibhav; Dechter, Rina
2010-01-01
The paper investigates parameterized approximate message-passing schemes that are based on bounded inference and are inspired by Pearl's belief propagation algorithm (BP). We start with the bounded inference mini-clustering algorithm and then move to the iterative scheme called Iterative Join-Graph Propagation (IJGP), that combines both iteration and bounded inference. Algorithm IJGP belongs to the class of Generalized Belief Propagation algorithms, a framework that allowed connections with approximate algorithms from statistical physics and is shown empirically to surpass the performance of mini-clustering and belief propagation, as well as a number of other state-of-the-art algorithms on several classes of networks. We also provide insight into the accuracy of iterative BP and IJGP by relating these algorithms to well known classes of constraint propagation schemes. PMID:20740057
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs.
Paul, Sushmita; Maji, Pradipta
2014-06-01
The microRNAs or miRNAs are short, endogenous RNAs having ability to regulate mRNA expression at the post-transcriptional level. Various studies have revealed that miRNAs tend to cluster on chromosomes. The members of a cluster that are in close proximity on chromosomes are highly likely to be processed as co-transcribed units. Therefore, a large proportion of miRNAs are co-expressed. Expression profiling of miRNAs generates a huge volume of data. Complicated networks of miRNA-mRNA interaction increase the challenges of comprehending and interpreting the resulting mass of data. In this regard, this paper presents a clustering algorithm in order to extract meaningful information from miRNA expression data. It judiciously integrates the merits of rough sets, fuzzy sets, the c-means algorithm, and the normalized range-normalized city block distance to discover co-expressed miRNA clusters. While the membership functions of fuzzy sets enable efficient handling of overlapping partitions in a noisy environment, the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition. The city block distance is used to compute the membership functions of fuzzy sets and to find initial partition of a data set, and therefore helps to handle minute differences between two miRNA expression profiles. The effectiveness of the proposed approach, along with a comparison with other related methods, is demonstrated for several miRNA expression data sets using different cluster validity indices. Moreover, the gene ontology is used to analyze the functional consistency and biological significance of generated miRNA clusters. PMID:24682049
Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan
2014-01-01
Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442
Grande, J A; Andújar, J M; Aroba, J; de la Torre, M L; Beltrán, R
2005-04-01
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity. PMID:15798799
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
Classification of high resolution satellite images using spatial constraints-based fuzzy clustering
NASA Astrophysics Data System (ADS)
Singh, Pankaj Pratap; Garg, Rahul Dev
2014-01-01
A spatial constraints-based fuzzy clustering technique is introduced in the paper and the target application is classification of high resolution multispectral satellite images. This fuzzy-C-means (FCM) technique enhances the classification results with the help of a weighted membership function (Wmf). Initially, spatial fuzzy clustering (FC) is used to segment the targeted vegetation areas with the surrounding low vegetation areas, which include the information of spatial constraints (SCs). The performance of the FCM image segmentation is subject to appropriate initialization of Wmf and SC. It is able to evolve directly from the initial segmentation by spatial fuzzy clustering. The controlling parameters in fuzziness of the FCM approach, Wmf and SC, help to estimate the segmented road results, then the Stentiford thinning algorithm is used to estimate the road network from the classified results. Such improvements facilitate FCM method manipulation and lead to segmentation that is more robust. The results confirm its effectiveness for satellite image classification, which extracts useful information in suburban and urban areas. The proposed approach, spatial constraint-based fuzzy clustering with a weighted membership function (SCFCWmf), has been used to extract the information of healthy trees with vegetation and shadows showing elevated features in satellite images. The performance values of quality assessment parameters show a good degree of accuracy for segmented roads using the proposed hybrid SCFCWmf-MO (morphological operations) approach which also occluded nonroad parts.
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
Ghaheri, Salehe; Masoum, Saeed; Gholami, Ali
2016-01-15
Analysis of fragrance composition is very important for both the fragrance producers and consumers. Unraveling of fragrance formulation is necessary for quality control, competitor and trace analysis. Gas chromatography-mass spectrometry (GC-MS) has been introduced as the most appropriate analytical technique for this type of analysis, which is based on Kovats index and MS database. The most straightforward method to analyze a GC-MS dataset is to integrate those peaks that can be recognized by their mass profiles. But, because of common problems of chromatographic data such as spectral background, baseline offset and specially overlapped peaks, accurate quantitative and qualitative analysis could be failed. Some chemometric modeling techniques such as bilinear multivariate curve resolution (MCR) methods have been introduced to overcome these problems and obtained well resolved chromatographic profiles. The main drawback of these methods is rotational ambiguity or nonunique solution that is represented as area of feasible solutions (AFS). Polygonal inflation algorithm (PIA) is an automatic and simple to use algorithm for numerical computation of AFS. In this study, the extent of rotational ambiguity in curve resolution methods is calculated by MCR-BAND toolbox and the PIA. The ability of the PIA in resolving GC-MS data sets is evaluated by simulated GC-MS data in comparison with other popular curve resolution methods such as multivariate curve resolution alternative least square (MCR-ALS), multivariate curve resolution objective function minimization (MCR-FMIN) by different initial estimation methods and independent component analysis (ICA). In addition, two typical challenging area of total ion chromatogram (TIC) of commercial fragrances with overlapped peaks were analyzed by the PIA to investigate the possibility of peak deconvolution analysis. PMID:26711156
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
NASA Astrophysics Data System (ADS)
Gregory, Kyle J.; Hill, Joanne E.; Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-05-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photo- electron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively. PMID:26284175
Muetterties, Earl L.
1980-05-01
Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Onishi, Masaki; Yoda, Ikushi
In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
GPU-based Multilevel Clustering.
Chiosa, Iurie; Kolb, Andreas
2010-04-01
The processing power of parallel co-processors like the Graphics Processing Unit (GPU) are dramatically increasing. However, up until now only a few approaches have been presented to utilize this kind of hardware for mesh clustering purposes. In this paper we introduce a Multilevel clustering technique designed as a parallel algorithm and solely implemented on the GPU. Our formulation uses the spatial coherence present in the cluster optimization and hierarchical cluster merging to significantly reduce the number of comparisons in both parts . Our approach provides a fast, high quality and complete clustering analysis. Furthermore, based on the original concept we present a generalization of the method to data clustering. All advantages of the meshbased techniques smoothly carry over to the generalized clustering approach. Additionally, this approach solves the problem of the missing topological information inherent to general data clustering and leads to a Local Neighbors k-means algorithm. We evaluate both techniques by applying them to Centroidal Voronoi Diagram (CVD) based clustering. Compared to classical approaches, our techniques generate results with at least the same clustering quality. Our technique proves to scale very well, currently being limited only by the available amount of graphics memory. PMID:20421676
A compilation of jet finding algorithms
Flaugher, B.; Meier, K.
1992-12-31
Technical descriptions of jet finding algorithms currently in use in p{anti p} collider experiments (CDF, UA1, UA2), e{sup +}e{sup {minus}} experiments and Monte-Carlo event generators (LUND programs, ISAJET) have been collected. For the hadron collider experiments, the clustering methods fall into two categories: cone algorithms and nearest-neighbor algorithms. In addition, UA2 has employed a combination of both methods for some analysis. While there are clearly differences between the cone and nearest-neighbor algorithms, the authors have found that there are also differences among the cone algorithms in the details of how the centroid of a cone cluster is located and how the E{sub T} and P{sub T} of the jet are defined. The most commonly used jet algorithm in electron-positron experiments is the JADE-type cluster algorithm. Five various incarnations of this approach have been described.
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
NASA Astrophysics Data System (ADS)
Owen, R. C.; Honrath, R. E.; Merrill, J.
2003-12-01
The use of cluster analysis to group atmospheric trajectories according to similar flow paths has become a common tool in atmospheric studies. Many methods are available to conduct a cluster analysis. However, the dependence of the resulting clusters upon the specific clustering method chosen has not been fully characterized. Specifically, the use of hierarchical versus non-hierarchical clustering algorithms has received little focus. This study presents the results of two cluster analyses: one using the hierarchical clustering algorithm average linkage, and one using the non-hierarchical clustering algorithm k-means. These results demonstrate the sensitivity of this cluster analysis to the use of a hierarchical method versus a non-hierarchical method. In addition, this study analyzes methods for dealing with the vertical component of trajectories during the clustering process. The analyses were performed using a 40-year set of trajectories to the PICO-NARE station, located atop Pico Mountain in the Azores Islands in the central North Atlantic.
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
Discriminative clustering via extreme learning machine.
Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng
2015-10-01
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Chalmers, Eric; Le, Jonathan; Sukhdeep, Dulai; Watt, Joe; Andersen, John; Lou, Edmond
2014-01-01
When children walk on their toes for no known reason, the condition is called Idiopathic Toe Walking (ITW). Assessing the true severity of ITW can be difficult because children can alter their gait while under observation in clinic. The ability to monitor the foot angle during daily life outside of clinic may improve the assessment of ITW. A foot-worn, battery-powered inertial sensing device has been designed to monitor patients' foot angle during daily activities. The monitor includes a 3-axis accelerometer, 2-axis gyroscope, and a low-power microcontroller. The device is necessarily small, with limited battery capacity and processing power. Therefore a high-accuracy but low-complexity inertial sensing algorithm is needed. This paper compares several low-complexity algorithms' aptitude for foot-angle measurement: accelerometer-only measurement, finite impulse response (FIR) and infinite impulse response (IIR) complementary filtering, and a new dynamic predict-correct style algorithm developed using fuzzy c-means clustering. A total of 11 subjects each walked 20 m with the inertial sensing device fixed to one foot; 10 m with normal gait and 10 m simulating toe walking. A cross-validation scheme was used to obtain a low-bias estimate of each algorithm's angle measurement accuracy. The new predict-correct algorithm achieved the lowest angle measurement error: <5° mean error during normal and toe walking. The IIR complementary filtering algorithm achieved almost-as good accuracy with less computational complexity. These two algorithms seem to have good aptitude for the foot-angle measurement problem, and would be good candidates for use in a long-term monitoring device for toe-walking assessment. PMID:24050952
Detecting alternative graph clusterings.
Mandala, Supreet; Kumara, Soundar; Yao, Tao
2012-07-01
The problem of graph clustering or community detection has enjoyed a lot of attention in complex networks literature. A quality function, modularity, quantifies the strength of clustering and on maximization yields sensible partitions. However, in most real world networks, there are an exponentially large number of near-optimal partitions with some being very different from each other. Therefore, picking an optimal clustering among the alternatives does not provide complete information about network topology. To tackle this problem, we propose a graph perturbation scheme which can be used to identify an ensemble of near-optimal and diverse clusterings. We establish analytical properties of modularity function under the perturbation which ensures diversity. Our approach is algorithm independent and therefore can leverage any of the existing modularity maximizing algorithms. We numerically show that our methodology can systematically identify very different partitions on several existing data sets. The knowledge of diverse partitions sheds more light into the topological organization and helps gain a more complete understanding of the underlying complex network. PMID:23005495
Detecting alternative graph clusterings
NASA Astrophysics Data System (ADS)
Mandala, Supreet; Kumara, Soundar; Yao, Tao
2012-07-01
The problem of graph clustering or community detection has enjoyed a lot of attention in complex networks literature. A quality function, modularity, quantifies the strength of clustering and on maximization yields sensible partitions. However, in most real world networks, there are an exponentially large number of near-optimal partitions with some being very different from each other. Therefore, picking an optimal clustering among the alternatives does not provide complete information about network topology. To tackle this problem, we propose a graph perturbation scheme which can be used to identify an ensemble of near-optimal and diverse clusterings. We establish analytical properties of modularity function under the perturbation which ensures diversity. Our approach is algorithm independent and therefore can leverage any of the existing modularity maximizing algorithms. We numerically show that our methodology can systematically identify very different partitions on several existing data sets. The knowledge of diverse partitions sheds more light into the topological organization and helps gain a more complete understanding of the underlying complex network.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance. PMID:21844626
Hierarchical clustering using mutual information
NASA Astrophysics Data System (ADS)
Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P.
2005-04-01
We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
AMIC@: All MIcroarray Clusterings @ once
Geraci, Filippo; Pellegrini, Marco; Renda, M. Elena
2008-01-01
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica. PMID:18477631
AMIC@: All MIcroarray Clusterings @ once.
Geraci, Filippo; Pellegrini, Marco; Renda, M Elena
2008-07-01
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica. PMID:18477631
Dhane, Dhiraj Manohar; Krishna, Vishal; Achar, Arun; Bar, Chittaranjan; Sanyal, Kunal; Chakraborty, Chandan
2016-09-01
Chronic lower extremity wound is a complicated disease condition of localized injury to skin and its tissues which have plagued many elders worldwide. The ulcer assessment and management is expensive and is burden on health establishment. Currently accurate wound evaluation remains a tedious task as it rely on visual inspection. This paper propose a new method for wound-area detection, using images digitally captured by a hand-held, optical camera. The strategy proposed involves spectral approach for clustering, based on the affinity matrix. The spectral clustering (SC) involves construction of similarity matrix of Laplacian based on Ng-Jorden-Weiss algorithm. Starting with a quadratic method, wound photographs were pre-processed for color homogenization. The first-order statistics filter was then applied to extract spurious regions. The filter was selected based on the performance, evaluated on four quality metrics. Then, the spectral method was used on the filtered images for effective segmentation. The segmented regions were post-processed using morphological operators. The performance of spectral segmentation was confirmed by ground-truth pictures labeled by dermatologists. The SC results were additionally compared with the results of k-means and Fuzzy C-Means (FCM) clustering algorithms. The SC approach on a set of 105 images, effectively delineated targeted wound beds yielding a segmentation accuracy of 86.73 %, positive predictive values of 91.80 %, and a sensitivity of 89.54 %. This approach shows the robustness of tool for ulcer perimeter measurement and healing progression. The article elucidates its potential to be incorporated in patient facing medical systems targeting a rapid clinical assistance. PMID:27520612
Systolic architecture for heirarchical clustering
Ku, L.C.
1984-01-01
Several hierarchical clustering methods (including single-linkage complete-linkage, centroid, and absolute overlap methods) are reviewed. The absolute overlap clustering method is selected for the design of systolic architecture mainly due to its simplicity. Two versions of systolic architectures for the absolute overlap hierarchical clustering algorithm are proposed: one-dimensional version that leads to the development of a two dimensional version which fully takes advantage of the underlying data structure of the problems. The two dimensional systolic architecture can achieve a time complexity of O(m + n) in comparison with the conventional computer implementation of a time complexity of O(m/sup 2*/n).
A new Growing Neural Gas for clustering data streams.
Ghesmoune, Mohammed; Lebbah, Mustapha; Azzag, Hanene
2016-06-01
Clustering data streams is becoming the most efficient way to cluster a massive dataset. This task requires a process capable of partitioning observations continuously with restrictions of memory and time. In this paper we present a new algorithm, called G-Stream, for clustering data streams by making one pass over the data. G-Stream is based on growing neural gas, that allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. By using a reservoir, and applying a fading function, the quality of clustering is improved. The performance of the proposed algorithm is evaluated on public datasets. PMID:26997530
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization. PMID:27046492
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le
2015-01-01
Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms. PMID:26357330
The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering
NASA Astrophysics Data System (ADS)
Schaefer, Andreas; Daniell, James; Wenzel, Friedemann
2016-04-01
Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in
Constrained Clustering With Imperfect Oracles.
Zhu, Xiatian; Loy, Chen Change; Gong, Shaogang
2016-06-01
While clustering is usually an unsupervised operation, there are circumstances where we have access to prior belief that pairs of samples should (or should not) be assigned with the same cluster. Constrained clustering aims to exploit this prior belief as constraint (or weak supervision) to influence the cluster formation so as to obtain a data structure more closely resembling human perception. Two important issues remain open: 1) how to exploit sparse constraints effectively and 2) how to handle ill-conditioned/noisy constraints generated by imperfect oracles. In this paper, we present a novel pairwise similarity measure framework to address the above issues. Specifically, in contrast to existing constrained clustering approaches that blindly rely on all features for constraint propagation, our approach searches for neighborhoods driven by discriminative feature selection for more effective constraint diffusion. Crucially, we formulate a novel approach to handling the noisy constraint problem, which has been unrealistically ignored in the constrained clustering literature. Extensive comparative results show that our method is superior to the state-of-the-art constrained clustering approaches and can generally benefit existing pairwise similarity-based data clustering algorithms, such as spectral clustering and affinity propagation. PMID:25622327
SMART: unique splitting-while-merging framework for gene clustering.
Fa, Rui; Roberts, David J; Nandi, Asoke K
2014-01-01
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
On evaluating clustering procedures for use in classification
NASA Technical Reports Server (NTRS)
Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (Principal Investigator)
1979-01-01
The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Processing of rock core microtomography images: Using seven different machine learning algorithms
NASA Astrophysics Data System (ADS)
Chauhan, Swarup; Rühaak, Wolfram; Khan, Faisal; Enzmann, Frieder; Mielke, Philipp; Kersten, Michael; Sass, Ingo
2016-01-01
The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.
NASA Technical Reports Server (NTRS)
1999-01-01
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
NASA Astrophysics Data System (ADS)
Abrams, Daniel S.
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases (commonly found in ab initio physics and chemistry problems) for which all known classical algorithms require exponential time. Fast algorithms for simulating many body Fermi systems are also provided in both first and second quantized descriptions. An efficient quantum algorithm for anti-symmetrization is given as well as a detailed discussion of a simulation of the Hubbard model. In addition, quantum algorithms that calculate numerical integrals and various characteristics of stochastic processes are described. Two techniques are given, both of which obtain an exponential speed increase in comparison to the fastest known classical deterministic algorithms and a quadratic speed increase in comparison to classical Monte Carlo (probabilistic) methods. I derive a simpler and slightly faster version of Grover's mean algorithm, show how to apply quantum counting to the problem, develop some variations of these algorithms, and show how both (apparently distinct) approaches can be understood from the same unified framework. Finally, the relationship between physics and computation is explored in some more depth, and it is shown that computational complexity theory depends very sensitively on physical laws. In particular, it is shown that nonlinear quantum mechanics allows for the polynomial time solution of NP-complete and #P oracle problems. Using the Weinberg model as a simple example, the explicit construction of the necessary gates is derived from the underlying physics. Nonlinear quantum algorithms are also presented using Polchinski type nonlinearities which do not allow for superluminal communication. (Copies available exclusively from MIT Libraries, Rm. 14- 0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
NASA Astrophysics Data System (ADS)
Miller, Christopher J. Miller
2012-03-01
There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8◦ on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Clustering Binary Data in the Presence of Masking Variables
ERIC Educational Resources Information Center
Brusco, Michael J.
2004-01-01
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the…
ERIC Educational Resources Information Center
Pottawattamie County School System, Council Bluffs, IA.
The 15 occupational clusters (transportation, fine arts and humanities, communications and media, personal service occupations, construction, hospitality and recreation, health occupations, marine science occupations, consumer and homemaking-related occupations, agribusiness and natural resources, environment, public service, business and office…
Sobel, E.; Lange, K.; O`Connell, J.R.
1996-12-31
Haplotyping is the logical process of inferring gene flow in a pedigree based on phenotyping results at a small number of genetic loci. This paper formalizes the haplotyping problem and suggests four algorithms for haplotype reconstruction. These algorithms range from exhaustive enumeration of all haplotype vectors to combinatorial optimization by simulated annealing. Application of the algorithms to published genetic analyses shows that manual haplotyping is often erroneous. Haplotyping is employed in screening pedigrees for phenotyping errors and in positional cloning of disease genes from conserved haplotypes in population isolates. 26 refs., 6 figs., 3 tabs.
Donchev, Todor I.; Petrov, Ivan G.
2011-05-31
Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
2015-01-01
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. PMID:26517376
Partition signed social networks via clustering dynamics
NASA Astrophysics Data System (ADS)
Wu, Jianshe; Zhang, Long; Li, Yong; Jiao, Yang
2016-02-01
Inspired by the dynamics phenomenon occurred in social networks, the WJJLGS model is modified to imitate the clustering dynamics of signed social networks. Analyses show that the clustering dynamics of the model can be applied to partition signed social networks. Traditionally, blockmodel is applied to partition signed networks. In this paper, a detailed dynamics-based algorithm for signed social networks (DBAS) is presented. Simulations on several typical real-world and illustrative networks that have been analyzed by the blockmodel verify the correctness of the proposed algorithm. The efficiency of the algorithm is verified on large scale synthetic networks.
DEDICATED FILTER FOR DEFECTS CLUSTERING IN RADIOGRAPHIC IMAGE
Sikora, R.; Swiadek, K.; Chady, T.
2009-03-03
Defect clusters such as linear or clustered porosity are in some cases even more important than single flaws. This paper presents two methods of defect clustering and algorithm for calculation of distances between flaws in digital radiographic image. Dedicated lookup table based filter is used for calculation of distances between objects in the specified range. For defect clustering two functions were developed. First one is based on MMD (Minimum Mean Distance) algorithm. Second one uses hierarchical procedures for clustering defects of various types, shapes and size.
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
Reactive Collision Avoidance Algorithm
NASA Technical Reports Server (NTRS)
Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred
2010-01-01
The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on
Bipartite graph partitioning and data clustering
Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.
2001-05-07
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
Large-scale metagenomic sequence clustering on map-reduce clusters.
Yang, Xiao; Zola, Jaroslaw; Aluru, Srinivas
2013-02-01
Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. In this paper, we present a parallel algorithm for taxonomic clustering of large metagenomic samples with support for overlapping clusters. We develop sketching techniques, akin to those created for web document clustering, to deduce significant similarities between pairs of sequences without resorting to expensive all vs. all comparison. We formulate the metagenomic classification problem as that of maximal quasi-clique enumeration in the resulting similarity graph, at multiple levels of the hierarchy as prescribed by different similarity thresholds. We cast execution of the underlying algorithmic steps as applications of the map-reduce framework to achieve a cloud ready implementation. We show that the resulting framework can produce high quality clustering of metagenomic samples consisting of millions of reads, in reasonable time limits, when executed on a modest size cluster. PMID:23427983
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
A New Elliptical Grid Clustering Method
NASA Astrophysics Data System (ADS)
Guansheng, Zheng
A new base on grid clustering method is presented in this paper. This new method first does unsupervised learning on the high dimensions data. This paper proposed a grid-based approach to clustering. It maps the data onto a multi-dimensional space and applies a linear transformation to the feature space instead of to the objects themselves and then approach a grid-clustering method. Unlike the conventional methods, it uses a multidimensional hyper-eclipse grid cell. Some case studies and ideas how to use the algorithms are described. The experimental results show that EGC can discover abnormity shapes of clusters.
Clustering of High Throughput Gene Expression Data
Pirim, Harun; Ekşioğlu, Burak; Perkins, Andy; Yüceer, Çetin
2012-01-01
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527
Matlab Cluster Ensemble Toolbox v. 1.0
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
A Short Survey of Document Structure Similarity Algorithms
Buttler, D
2004-02-27
This paper provides a brief survey of document structural similarity algorithms, including the optimal Tree Edit Distance algorithm and various approximation algorithms. The approximation algorithms include the simple weighted tag similarity algorithm, Fourier transforms of the structure, and a new application of the shingle technique to structural similarity. We show three surprising results. First, the Fourier transform technique proves to be the least accurate of any of approximation algorithms, while also being slowest. Second, optimal Tree Edit Distance algorithms may not be the best technique for clustering pages from different sites. Third, the simplest approximation to structure may be the most effective and efficient mechanism for many applications.
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Cui, Xiaohui; Mueller, Frank; Zhang, Yongpeng; Potok, Thomas E
2010-01-01
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
Spectral clustering for TRUS images
Mohamed, Samar S; Salama, Magdy MA
2007-01-01
Background Identifying the location and the volume of the prostate is important for ultrasound-guided prostate brachytherapy. Prostate volume is also important for prostate cancer diagnosis. Manual outlining of the prostate border is able to determine the prostate volume accurately, however, it is time consuming and tedious. Therefore, a number of investigations have been devoted to designing algorithms that are suitable for segmenting the prostate boundary in ultrasound images. The most popular method is the deformable model (snakes), a method that involves designing an energy function and then optimizing this function. The snakes algorithm usually requires either an initial contour or some points on the prostate boundary to be estimated close enough to the original boundary which is considered a drawback to this powerful method. Methods The proposed spectral clustering segmentation algorithm is built on a totally different foundation that doesn't involve any function design or optimization. It also doesn't need any contour or any points on the boundary to be estimated. The proposed algorithm depends mainly on graph theory techniques. Results Spectral clustering is used in this paper for both prostate gland segmentation from the background and internal gland segmentation. The obtained segmented images were compared to the expert radiologist segmented images. The proposed algorithm obtained excellent gland segmentation results with 93% average overlap areas. It is also able to internally segment the gland where the segmentation showed consistency with the cancerous regions identified by the expert radiologist. Conclusion The proposed spectral clustering segmentation algorithm obtained fast excellent estimates that can give rough prostate volume and location as well as internal gland segmentation without any user interaction. PMID:17359549
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Retro: concept-based clustering of biomedical topical sets
Yeganova, Lana; Kim, Won; Kim, Sun; Wilbur, W. John
2014-01-01
Motivation: Clustering methods can be useful for automatically grouping documents into meaningful clusters, improving human comprehension of a document collection. Although there are clustering algorithms that can achieve the goal for relatively large document collections, they do not always work well for small and homogenous datasets. Methods: In this article, we present Retro—a novel clustering algorithm that extracts meaningful clusters along with concise and descriptive titles from small and homogenous document collections. Unlike common clustering approaches, our algorithm predicts cluster titles before clustering. It relies on the hypergeometric distribution model to discover key phrases, and generates candidate clusters by assigning documents to these phrases. Further, the statistical significance of candidate clusters is tested using supervised learning methods, and a multiple testing correction technique is used to control the overall quality of clustering. Results: We test our system on five disease datasets from OMIM® and evaluate the results based on MeSH® term assignments. We further compare our method with several baseline and state-of-the-art methods, including K-means, expectation maximization, latent Dirichlet allocation-based clustering, Lingo, OPTIMSRC and adapted GK-means. The experimental results on the 20-Newsgroup and ODP-239 collections demonstrate that our method is successful at extracting significant clusters and is superior to existing methods in terms of quality of clusters. Finally, we apply our system to a collection of 6248 topical sets from the HomoloGene® database, a resource in PubMed®. Empirical evaluation confirms the method is useful for small homogenous datasets in producing meaningful clusters with descriptive titles. Availability and implementation: A web-based demonstration of the algorithm applied to a collection of sets from the HomoloGene database is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/CLUSTERING
NASA Technical Reports Server (NTRS)
Barth, Timothy J.; Lomax, Harvard
1987-01-01
The past decade has seen considerable activity in algorithm development for the Navier-Stokes equations. This has resulted in a wide variety of useful new techniques. Some examples for the numerical solution of the Navier-Stokes equations are presented, divided into two parts. One is devoted to the incompressible Navier-Stokes equations, and the other to the compressible form.
Robust Face Clustering Via Tensor Decomposition.
Cao, Xiaochun; Wei, Xingxing; Han, Yahong; Lin, Dongdai
2015-11-01
Face clustering is a key component either in image managements or video analysis. Wild human faces vary with the poses, expressions, and illumination changes. All kinds of noises, like block occlusions, random pixel corruptions, and various disguises may also destroy the consistency of faces referring to the same person. This motivates us to develop a robust face clustering algorithm that is less sensitive to these noises. To retain the underlying structured information within facial images, we use tensors to represent faces, and then accomplish the clustering task based on the tensor data. The proposed algorithm is called robust tensor clustering (RTC), which firstly finds a lower-rank approximation of the original tensor data using a L1 norm optimization function. Because L1 norm does not exaggerate the effect of noises compared with L2 norm, the minimization of the L1 norm approximation function makes RTC robust. Then, we compute high-order singular value decomposition of this approximate tensor to obtain the final clustering results. Different from traditional algorithms solving the approximation function with a greedy strategy, we utilize a nongreedy strategy to obtain a better solution. Experiments conducted on the benchmark facial datasets and gait sequences demonstrate that RTC has better performance than the state-of-the-art clustering algorithms and is more robust to noises. PMID:25546869
Multiple Manifold Clustering Using Curvature Constrained Path
Babaeian, Amir; Bayestehtashk, Alireza; Bandarabadi, Mojtaba
2015-01-01
The problem of multiple surface clustering is a challenging task, particularly when the surfaces intersect. Available methods such as Isomap fail to capture the true shape of the surface near by the intersection and result in incorrect clustering. The Isomap algorithm uses shortest path between points. The main draw back of the shortest path algorithm is due to the lack of curvature constrained where causes to have a path between points on different surfaces. In this paper we tackle this problem by imposing a curvature constraint to the shortest path algorithm used in Isomap. The algorithm chooses several landmark nodes at random and then checks whether there is a curvature constrained path between each landmark node and every other node in the neighborhood graph. We build a binary feature vector for each point where each entry represents the connectivity of that point to a particular landmark. Then the binary feature vectors could be used as a input of conventional clustering algorithm such as hierarchical clustering. We apply our method to simulated and some real datasets and show, it performs comparably to the best methods such as K-manifold and spectral multi-manifold clustering. PMID:26375819
Java implementation of Class Association Rule algorithms
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less
Java implementation of Class Association Rule algorithms
Tamura, Makio
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.
An evolutionary and visual framework for clustering of DNA microarray data.
Castellanos-Garzón, José A; Díaz, Fernando
2013-01-01
This paper presents a case study to show the competence of our evolutionary and visual framework for cluster analysis of DNA microarray data. The proposed framework joins a genetic algorithm for hierarchical clustering with a set of visual components of cluster tasks given by a tool. The cluster visualization tool allows us to display different views of clustering results as a means of cluster visual validation. The results of the genetic algorithm for clustering have shown that it can find better solutions than the other methods for the selected data set. Thus, this shows the reliability of the proposed framework. PMID:24231146
A fast meteor detection algorithm
NASA Astrophysics Data System (ADS)
Gural, P.
2016-01-01
A low latency meteor detection algorithm for use with fast steering mirrors had been previously developed to track and telescopically follow meteors in real-time (Gural, 2007). It has been rewritten as a generic clustering and tracking software module for meteor detection that meets both the demanding throughput requirements of a Raspberry Pi while also maintaining a high probability of detection. The software interface is generalized to work with various forms of front-end video pre-processing approaches and provides a rich product set of parameterized line detection metrics. Discussion will include the Maximum Temporal Pixel (MTP) compression technique as a fast thresholding option for feeding the detection module, the detection algorithm trade for maximum processing throughput, details on the clustering and tracking methodology, processing products, performance metrics, and a general interface description.
Joint clustering of protein interaction networks through Markov random walk
2014-01-01
Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis. PMID:24565376
Joint clustering of protein interaction networks through Markov random walk.
Wang, Yijie; Qian, Xiaoning
2014-01-01
Biological networks obtained by high-throughput profiling or human curation are typically noisy. For functional module identification, single network clustering algorithms may not yield accurate and robust results. In order to borrow information across multiple sources to alleviate such problems due to data quality, we propose a new joint network clustering algorithm ASModel in this paper. We construct an integrated network to combine network topological information based on protein-protein interaction (PPI) datasets and homological information introduced by constituent similarity between proteins across networks. A novel random walk strategy on the integrated network is developed for joint network clustering and an optimization problem is formulated by searching for low conductance sets defined on the derived transition matrix of the random walk, which fuses both topology and homology information. The optimization problem of joint clustering is solved by a derived spectral clustering algorithm. Network clustering using several state-of-the-art algorithms has been implemented to both PPI networks within the same species (two yeast PPI networks and two human PPI networks) and those from different species (a yeast PPI network and a human PPI network). Experimental results demonstrate that ASModel outperforms the existing single network clustering algorithms as well as another recent joint clustering algorithm in terms of complex prediction and Gene Ontology (GO) enrichment analysis. PMID:24565376
Clustering of Variables for Mixed Data
NASA Astrophysics Data System (ADS)
Saracco, J.; Chavent, M.
2016-05-01
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
Using clustering for document reconstruction
NASA Astrophysics Data System (ADS)
Ukovich, Anna; Zacchigna, Alessandra; Ramponi, Giovanni; Schoier, Gabriella
2006-02-01
In the forensics and investigative science fields there may arise the need of reconstructing documents which have been destroyed by means of a shredder. In a computer-based reconstruction, the pieces are described by numerical features, which represent the visual content of the strips. Usually, the pieces of different pages have been mixed. We propose an approach for the reconstruction which performs a first clustering on the strips to ease the successive matching, be it manual (with the help of a computer) or automatic. A number of features, extracted by means of image processing algorithms, have been selected for this aim. The results show the effectiveness of the features and of the proposed clustering algorithm.
MODEL-BASED CLUSTERING FOR CLASSIFICATION OF AQUATIC SYSTEMS AND DIAGNOSIS OF ECOLOGICAL STRESS
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that...
Cluster analysis of gene expression data based on self-splitting and merging competitive learning.
Wu, Shuanhu; Liew, Alan Wee-Chung; Yan, Hong; Yang, Mengsu
2004-03-01
Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method. PMID:15055797
Collaborative Clustering for Sensor Networks
NASA Technical Reports Server (NTRS)
Wagstaff. Loro :/; Green Jillian; Lane, Terran
2011-01-01
Traditionally, nodes in a sensor network simply collect data and then pass it on to a centralized node that archives, distributes, and possibly analyzes the data. However, analysis at the individual nodes could enable faster detection of anomalies or other interesting events, as well as faster responses such as sending out alerts or increasing the data collection rate. There is an additional opportunity for increased performance if individual nodes can communicate directly with their neighbors. Previously, a method was developed by which machine learning classification algorithms could collaborate to achieve high performance autonomously (without requiring human intervention). This method worked for supervised learning algorithms, in which labeled data is used to train models. The learners collaborated by exchanging labels describing the data. The new advance enables clustering algorithms, which do not use labeled data, to also collaborate. This is achieved by defining a new language for collaboration that uses pair-wise constraints to encode useful information for other learners. These constraints specify that two items must, or cannot, be placed into the same cluster. Previous work has shown that clustering with these constraints (in isolation) already improves performance. In the problem formulation, each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. Each learner clusters its data and then selects a pair of items about which it is uncertain and uses them to query its neighbors. The resulting feedback (a must and cannot constraint from each neighbor) is combined by the learner into a consensus constraint, and it then reclusters its data while incorporating the new constraint. A strategy was also proposed for cleaning the resulting constraint sets, which may contain conflicting constraints; this improves performance significantly. This approach has been applied to collaborative
Clustering with Missing Values: No Imputation Required
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri
2004-01-01
Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Graph Theory In Protein Sequence Clustering And Tertiary Structural Matching
NASA Astrophysics Data System (ADS)
Abdullah, Rosni; Rashid, Nur'Aini Abdul; Othman, Fazilah
2008-01-01
The principle of graph theory which has been widely used in computer networks is now being adopted for work in protein clustering, protein structural matching, and protein folding and modeling. In this work, we present two case studies on the use of graph theory for protein clustering and tertiary structural matching. In protein clustering, we extended a clustering algorithm based on a maximal clique while in the protein tertiary structural matching we explored the bipartite graph matching algorithm. The results obtained in both the case studies will be presented.
Firefly Algorithm for Structural Search.
Avendaño-Franco, Guillermo; Romero, Aldo H
2016-07-12
The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers. PMID:27232694
Dynamically weighted clustering with noise set
Shen, Yijing; Sun, Wei; Li, Ker-Chau
2010-01-01
Motivation: Various clustering methods have been applied to microarray gene expression data for identifying genes with similar expression profiles. As the biological annotation data accumulated, more and more genes have been organized into functional categories. Functionally related genes may be regulated by common cellular signals, thus likely to be co-expressed. Consequently, utilizing the rapidly increasing functional annotation resources such as Gene Ontology (GO) to improve the performance of clustering methods is of great interest. On the opposite side of clustering, there are genes that have distinct expression profiles and do not co-express with other genes. Identification of these scattered genes could enhance the performance of clustering methods. Results: We developed a new clustering algorithm, Dynamically Weighted Clustering with Noise set (DWCN), which makes use of gene annotation information and allows for a set of scattered genes, the noise set, to be left out of the main clusters. We tested the DWCN method and contrasted its results with those obtained using several common clustering techniques on a simulated dataset as well as on two public datasets: the Stanford yeast cell-cycle gene expression data, and a gene expression dataset for a group of genetically different yeast segregants. Conclusion: Our method produces clusters with more consistent functional annotations and more coherent expression patterns than existing clustering techniques. Contact: yshen@stat.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20007256
Segmentation of MRI Brain Images with an Improved Harmony Searching Algorithm.
Yang, Zhang; Shufan, Ye; Li, Guo; Weifeng, Ding
2016-01-01
The harmony searching (HS) algorithm is a kind of optimization search algorithm currently applied in many practical problems. The HS algorithm constantly revises variables in the harmony database and the probability of different values that can be used to complete iteration convergence to achieve the optimal effect. Accordingly, this study proposed a modified algorithm to improve the efficiency of the algorithm. First, a rough set algorithm was employed to improve the convergence and accuracy of the HS algorithm. Then, the optimal value was obtained using the improved HS algorithm. The optimal value of convergence was employed as the initial value of the fuzzy clustering algorithm for segmenting magnetic resonance imaging (MRI) brain images. Experimental results showed that the improved HS algorithm attained better convergence and more accurate results than those of the original HS algorithm. In our study, the MRI image segmentation effect of the improved algorithm was superior to that of the original fuzzy clustering method. PMID:27403428
Segmentation of MRI Brain Images with an Improved Harmony Searching Algorithm
Yang, Zhang; Li, Guo; Weifeng, Ding
2016-01-01
The harmony searching (HS) algorithm is a kind of optimization search algorithm currently applied in many practical problems. The HS algorithm constantly revises variables in the harmony database and the probability of different values that can be used to complete iteration convergence to achieve the optimal effect. Accordingly, this study proposed a modified algorithm to improve the efficiency of the algorithm. First, a rough set algorithm was employed to improve the convergence and accuracy of the HS algorithm. Then, the optimal value was obtained using the improved HS algorithm. The optimal value of convergence was employed as the initial value of the fuzzy clustering algorithm for segmenting magnetic resonance imaging (MRI) brain images. Experimental results showed that the improved HS algorithm attained better convergence and more accurate results than those of the original HS algorithm. In our study, the MRI image segmentation effect of the improved algorithm was superior to that of the original fuzzy clustering method. PMID:27403428
Reducing the Time Requirement of k-Means Algorithm
Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou
2012-01-01
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data. PMID:23239974
Fuzzy and hard clustering analysis for thyroid disease.
Azar, Ahmad Taher; El-Said, Shaimaa Ahmed; Hassanien, Aboul Ella
2013-07-01
Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c=3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. PMID:23357404
Deterministic algorithm with agglomerative heuristic for location problems
NASA Astrophysics Data System (ADS)
Kazakovtsev, L.; Stupina, A.
2015-10-01
Authors consider the clustering problem solved with the k-means method and p-median problem with various distance metrics. The p-median problem and the k-means problem as its special case are most popular models of the location theory. They are implemented for solving problems of clustering and many practically important logistic problems such as optimal factory or warehouse location, oil or gas wells, optimal drilling for oil offshore, steam generators in heavy oil fields. Authors propose new deterministic heuristic algorithm based on ideas of the Information Bottleneck Clustering and genetic algorithms with greedy heuristic. In this paper, results of running new algorithm on various data sets are given in comparison with known deterministic and stochastic methods. New algorithm is shown to be significantly faster than the Information Bottleneck Clustering method having analogous preciseness.
Lotz, Martin K.; Otsuki, Shuhei; Grogan, Shawn P.; Sah, Robert; Terkeltaub, Robert; D’Lima, Darryl
2010-01-01
The formation of new cell clusters is a histological hallmark of arthritic cartilage but the biology of clusters and their role in disease are poorly understood. This is the first comprehensive review of clinical and experimental conditions associated with cluster formation. Genes and proteins that are expressed in cluster cells, the cellular origin of the clusters, mechanisms that lead to cluster formation and the role of cluster cells in pathogenesis are discussed. PMID:20506158
Weighted voting-based consensus clustering for chemical structure databases.
Saeed, Faisal; Ahmed, Ali; Shamsir, Mohd Shahir; Salim, Naomie
2014-06-01
The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF_4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward's method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures. PMID:24830925
Clustering Short Time-Series Microarray
NASA Astrophysics Data System (ADS)
Ping, Loh Wei; Hasan, Yahya Abu
2008-01-01
Most microarray analyses are carried out on static gene expressions. However, the dynamical study of microarrays has lately gained more attention. Most researches on time-series microarray emphasize on the bioscience and medical aspects but few from the numerical aspect. This study attempts to analyze short time-series microarray mathematically using STEM clustering tool which formally preprocess data followed by clustering. We next introduce the Circular Mould Distance (CMD) algorithm with combinations of both preprocessing and clustering analysis. Both methods are subsequently compared in terms of efficiencies.
Plumix: Generating mass segregated star clusters
NASA Astrophysics Data System (ADS)
Šubr, Ladislav
2012-06-01
Plumix is a small package for generating mass segregated star clusters. Its output can be directly used as input initial conditions for NBODY4 or NBODY6 code. Mass segregation stands as one of the most robust features of the dynamical evolution of self-gravitating star clusters. We formulate parametrized models of mass segregated star clusters in virial equilibrium. To this purpose we introduce mean inter-particle potentials for statistically described unsegregated systems and suggest a single-parameter generalization of its form which gives a mass-segregated state. Plumix is a numerical C-code generating the cluster according the algorithm given for construction of appropriate star cluster models. Their stability over several crossing-times is verified by following the evolution by means of direct N-body integration.
Some Basic Elements in Clustering and Classification
NASA Astrophysics Data System (ADS)
Grégoire, G.
2016-05-01
This chapter deals with basic tools useful in clustering and classification and present some commonly used approaches for these two problems. Since several chapters in these proceedings are devoted to approaches to deal with classification, we give more attention in this chapter to clustering issues. We are first concerned with notions of distances or dissimilarities between objects we are to group in clusters. Then based on these inter-objects distances we define distances between sets of objects, such as single linkage, complete linkage or Ward distance. Three clustering algorithms are presented with some details and compared: Kmeans, Ascendant Hierarchical and DBSCAN algorithms. The comparison between partitions and the issue of choosing the correct number of clusters are investigated and the proposed procedures are tested on two data sets. We emphasize the fact that the results provided by the numerous indices available in the literature for selecting the number of clusters is largely depending upon the shape and the dispersion we are assuming for these clusters. Finally the last section is devoted to classification. Some basic notions such as training sets, test sets and cross-validation are discussed. Two particular approaches are detailed, the K-nearest neighbors method and the logistic regression, and comparisons with LDA (Linear Discriminant Analysis) and QDA (Quadratic Discriminant Analysis) are analyzed.
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning
NASA Astrophysics Data System (ADS)
Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff
2016-01-01
Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
SEQOPTICS: a protein sequence clustering system
Chen, Yonghui; Reilly, Kevin D; Sprague, Alan P; Guan, Zhijie
2006-01-01
Background Protein sequence clustering has been widely used as a part of the analysis of protein structure and function. In most cases single linkage or graph-based clustering algorithms have been applied. OPTICS (Ordering Points To Identify the Clustering Structure) is an attractive approach due to its emphasis on visualization of results and support for interactive work, e.g., in choosing parameters. However, OPTICS has not been used, as far as we know, for protein sequence clustering. Results In this paper, a system of clustering proteins, SEQOPTICS (SEQuence clustering with OPTICS) is demonstrated. The system is implemented with Smith-Waterman as protein distance measurement and OPTICS at its core to perform protein sequence clustering. SEQOPTICS is tested with four data sets from different data sources. Visualization of the sequence clustering structure is demonstrated as well. Conclusion The system was evaluated by comparison with other existing methods. Analysis of the results demonstrates that SEQOPTICS performs better based on some evaluation criteria including Jaccard coefficient, Precision, and Recall. It is a promising protein sequence clustering method with future possible improvement on parallel computing and other protein distance measurements. PMID:17217502
Value-balanced agglomerative connectivity clustering
NASA Astrophysics Data System (ADS)
Gupta, Gunjan K.; Ghosh, Joydeep
2001-03-01
In this paper we propose a new clustering framework for transactional data-sets involving large numbers of customers and products. Such transactional data pose particular issues such as very high dimensionality (greater than 10,000), and sparse categorical entries, that have been dealt with more effectively using a graph-based approach to clustering such as ROCK. But large transactional data raises certain other issues such as how to compare diverse products (e.g. milk vs. cars) cluster balancing and outlier removal, that need to be addressed. We first propose a new similarity measure that takes the value of the goods purchased into account, and form a value-based graph representation based on this similarity measure. A novel value-based balancing criterion that allows the user to control the balancing of clusters, is then defined. This balancing criterion is integrated with a value-based goodness measure for merging two clusters in an agglomerative clustering routine. Since graph-based clustering algorithms are very sensitive to outliers, we also propose a fast, effective and simple outlier detection and removal method based on under-clustering or over- partitioning. The performance of the proposed clustering framework is compared with leading graph-theoretic approaches such as ROCK and METIS.
Genetic-Annealing Algorithm in Grid Environment for Scheduling Problems
NASA Astrophysics Data System (ADS)
Cruz-Chávez, Marco Antonio; Rodríguez-León, Abelardo; Ávila-Melgar, Erika Yesenia; Juárez-Pérez, Fredy; Cruz-Rosales, Martín H.; Rivera-López, Rafael
This paper presents a parallel hybrid evolutionary algorithm executed in a grid environment. The algorithm executes local searches using simulated annealing within a Genetic Algorithm to solve the job shop scheduling problem. Experimental results of the algorithm obtained in the "Tarantula MiniGrid" are shown. Tarantula was implemented by linking two clusters from different geographic locations in Mexico (Morelos-Veracruz). The technique used to link the two clusters and configure the Tarantula MiniGrid is described. The effects of latency in communication between the two clusters are discussed. It is shown that the evolutionary algorithm presented is more efficient working in Grid environments because it can carry out major exploration and exploitation of the solution space.
Winlaw, Manda; De Sterck, Hans; Sanders, Geoffrey
2015-10-26
In very simple terms a network can be de ned as a collection of points joined together by lines. Thus, networks can be used to represent connections between entities in a wide variety of elds including engi- neering, science, medicine, and sociology. Many large real-world networks share a surprising number of properties, leading to a strong interest in model development research and techniques for building synthetic networks have been developed, that capture these similarities and replicate real-world graphs. Modeling these real-world networks serves two purposes. First, building models that mimic the patterns and prop- erties of real networks helps to understand the implications of these patterns and helps determine which patterns are important. If we develop a generative process to synthesize real networks we can also examine which growth processes are plausible and which are not. Secondly, high-quality, large-scale network data is often not available, because of economic, legal, technological, or other obstacles [7]. Thus, there are many instances where the systems of interest cannot be represented by a single exemplar network. As one example, consider the eld of cybersecurity, where systems require testing across diverse threat scenarios and validation across diverse network structures. In these cases, where there is no single exemplar network, the systems must instead be modeled as a collection of networks in which the variation among them may be just as important as their common features. By developing processes to build synthetic models, so-called graph generators, we can build synthetic networks that capture both the essential features of a system and realistic variability. Then we can use such synthetic graphs to perform tasks such as simulations, analysis, and decision making. We can also use synthetic graphs to performance test graph analysis algorithms, including clustering algorithms and anomaly detection algorithms.
ASteCA: Automated Stellar Cluster Analysis
NASA Astrophysics Data System (ADS)
Perren, G. I.; Vázquez, R. A.; Piatti, A. E.
2015-04-01
We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold
2014-12-01
In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. PMID:25233483
A novel iris segmentation algorithm based on small eigenvalue analysis
NASA Astrophysics Data System (ADS)
Harish, B. S.; Aruna Kumar, S. V.; Guru, D. S.; Ngo, Minh Ngoc
2015-12-01
In this paper, a simple and robust algorithm is proposed for iris segmentation. The proposed method consists of two steps. In first step, iris and pupil is segmented using Robust Spatial Kernel FCM (RSKFCM) algorithm. RSKFCM is based on traditional Fuzzy-c-Means (FCM) algorithm, which incorporates spatial information and uses kernel metric as distance measure. In second step, small eigenvalue transformation is applied to localize iris boundary. The transformation is based on statistical and geometrical properties of the small eigenvalue of the covariance matrix of a set of edge pixels. Extensive experimentations are carried out on standard benchmark iris dataset (viz. CASIA-IrisV4 and UBIRIS.v2). We compared our proposed method with existing iris segmentation methods. Our proposed method has the least time complexity of O(n(i+p)) . The result of the experiments emphasizes that the proposed algorithm outperforms the existing iris segmentation methods.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the
CLUM: a cluster program for analyzing microarray data.
Irigoien, I; Fernandez, E; Vives, S; Arenas, C
2008-08-01
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. PMID:18825964
BioCluster: Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data
Abdullah, Ahmed; Sabbir Alam, S.M.; Sultana, Munawar; Hossain, M. Anwar
2015-01-01
Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. PMID:26216453
The mathematical modeling of grouping the dipole water clusters
NASA Astrophysics Data System (ADS)
Shaidurov, Vladimir; Kornienko, Viktoria; Vyatkin, Alexander
2016-08-01
In the present paper, a physical-mathematical model and a computational algorithm implementing the model are proposed to study the behavior of particles having an electric dipole moment in an external electric field. Computational experiments demonstrate the orientation dynamics of water clusters with the increase of the generated field. The dipole properties of some water clusters were previously determined using Hyperchem program.
Gennclus: New Models for General Nonhierarchical Clustering Analysis.
ERIC Educational Resources Information Center
Desarbo, Wayne S.
1982-01-01
A general class of nonhierarchical clustering models and associated algorithms for fitting them are presented. These models generalize the Shepard-Arabie Additive clusters model. Two applications are given and extensions to three-way models, nonmetric analyses, and other model specifications are provided. (Author/JKS)
Application of Simulated Annealing to Clustering Tuples in Databases.
ERIC Educational Resources Information Center
Bell, D. A.; And Others
1990-01-01
Investigates the value of applying principles derived from simulated annealing to clustering tuples in database design, and compares this technique with a graph-collapsing clustering method. It is concluded that, while the new method does give superior results, the expense involved in algorithm run time is prohibitive. (24 references) (CLB)
Graphical Evaluation of Hierarchical Clustering Schemes. Technical Report No. 1.
ERIC Educational Resources Information Center
Halff, Henry M.
Graphical methods for evaluating the fit of Johnson's hierarchical clustering schemes are presented together with an example. These evaluation methods examine the extent to which the clustering algorithm can minimize the overlap of the distributions of intracluster and intercluster distances. (Author)
A Distributed Flocking Approach for Information Stream Clustering Analysis
Cui, Xiaohui; Potok, Thomas E
2006-01-01
Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.
Fontana, W.
1990-12-13
In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.
Spectral Imaging of Galaxy Clusters with Planck
NASA Astrophysics Data System (ADS)
Bourdin, H.; Mazzotta, P.; Rasia, E.
2015-12-01
The Sunyaev-Zeldovich (SZ) effect is a promising tool for detecting the presence of hot gas out to the galaxy cluster peripheries. We developed a spectral imaging algorithm dedicated to the SZ observations of nearby galaxy clusters with Planck, with the aim of revealing gas density anisotropies related to the filamentary accretion of materials, or pressure discontinuities induced by the propagation of shock fronts. To optimize an unavoidable trade-off between angular resolution and precision of the SZ flux measurements, the algorithm performs a multi-scale analysis of the SZ maps as well as of other extended components, such as the cosmic microwave background (CMB) anisotropies and the Galactic thermal dust. The demixing of the SZ signal is tackled through kernel-weighted likelihood maximizations. The CMB anisotropies are further analyzed through a wavelet analysis, while the Galactic foregrounds and SZ maps are analyzed via a curvelet analysis that best preserves their anisotropic details. The algorithm performance has been tested against mock observations of galaxy clusters obtained by simulating the Planck High Frequency Instrument and by pointing at a few characteristic positions in the sky. These tests suggest that Planck should easily allow us to detect filaments in the cluster peripheries and detect large-scale shocks in colliding galaxy clusters that feature favorable geometry.
Overlapping clusters for distributed computation.
Mirrokni, Vahab; Andersen, Reid; Gleich, David F.
2010-11-01
Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initial partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.