NASA Astrophysics Data System (ADS)
Liu, Lifeng; Sun, Sam Zandong; Yu, Hongyu; Yue, Xingtong; Zhang, Dong
2016-06-01
Considering the fact that the fluid distribution in carbonate reservoir is very complicated and the existing fluid prediction methods are not able to produce ideal predicted results, this paper proposes a new fluid identification method in carbonate reservoir based on the modified Fuzzy C-Means (FCM) Clustering algorithm. Both initialization and globally optimum cluster center are produced by Chaotic Quantum Particle Swarm Optimization (CQPSO) algorithm, which can effectively avoid the disadvantage of sensitivity to initial values and easily falling into local convergence in the traditional FCM Clustering algorithm. Then, the modified algorithm is applied to fluid identification in the carbonate X area in Tarim Basin of China, and a mapping relation between fluid properties and pre-stack elastic parameters will be built in multi-dimensional space. It has been proven that this modified algorithm has a good ability of fuzzy cluster and its total coincidence rate of fluid prediction reaches 97.10%. Besides, the membership of different fluids can be accumulated to obtain respective probability, which can evaluate the uncertainty in fluid identification result.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
NASA Astrophysics Data System (ADS)
Choi, Hon-Chit; Wen, Lingfeng; Eberl, Stefan; Feng, Dagan
2006-03-01
Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K I), volume of distribution (V d) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K I-k 4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP I & BP II) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.
NASA Astrophysics Data System (ADS)
Schröter, Ingmar; Paasche, Hendik; Dietrich, Peter; Wollschläger, Ute
2014-05-01
Soil moisture is a key variable of the hydrological cycle. For example, it controls partitioning of rainfall into a runoff and an infiltration component and modulating physical, chemical and biological processes within the soil. For a better understanding of these processes, knowledge about the spatio-temporal distribution of soil moisture is indispensable. For the field to the small catchment scale with survey areas up to a few square kilometres, there are numerous new and innovative ground-based and remote sensing technologies available which have great potential to provide temporal information about soil moisture patterns. The aim of this work is to design an optimal soil moisture monitoring program for a low-mountain catchment in central Germany. In a first step, the fuzzy c-means clustering technique (Paasche et al., 2006) was used to identify structure-relevant patterns in a set of different terrain attributes derived from a DEM. Based on these patterns optimal measurement locations were identified to conduct in-situ soil moisture measurements. To consider different wetting and drying states in the catchment, several TDR measurement campaigns were conducted from April to October 2013. The TDR measurements have been integrated with the structure-relevant patterns obtained by the fuzzy cluster analysis to regionally predict soil moisture. In this study, we outline the conceptual framework of this integrative approach and present first results from field measurements. The results of the project are expected to improve the monitoring and understanding of small catchment-scale hydrological processes and to contribute to a better representation of soil moisture dynamics in physically-based, hydrological models operating at the field to the small catchment scale. Reference: Paasche, H., J. Tronicke, K. Holliger, A.G. Green, and H. Maurer (2006): Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-01-01
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor’s location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes. PMID:27618061
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network.
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-01-01
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor's location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes. PMID:27618061
Fuzzy C-Means Clustering and Energy Efficient Cluster Head Selection for Cooperative Sensor Network.
Bhatti, Dost Muhammad Saqib; Saeed, Nasir; Nam, Haewoon
2016-01-01
We propose a novel cluster based cooperative spectrum sensing algorithm to save the wastage of energy, in which clusters are formed using fuzzy c-means (FCM) clustering and a cluster head (CH) is selected based on a sensor's location within each cluster, its location with respect to fusion center (FC), its signal-to-noise ratio (SNR) and its residual energy. The sensing information of a single sensor is not reliable enough due to shadowing and fading. To overcome these issues, cooperative spectrum sensing schemes were proposed to take advantage of spatial diversity. For cooperative spectrum sensing, all sensors sense the spectrum and report the sensed energy to FC for the final decision. However, it increases the energy consumption of the network when a large number of sensors need to cooperate; in addition to that, the efficiency of the network is also reduced. The proposed algorithm makes the cluster and selects the CHs such that very little amount of network energy is consumed and the highest efficiency of the network is achieved. Using the proposed algorithm maximum probability of detection under an imperfect channel is accomplished with minimum energy consumption as compared to conventional clustering schemes.
Zhang, Ji-fu; Li, Xin; Yang, Hai-feng
2012-05-01
Discretization of continuous numerical attribute is one of the important research works in the preprocessing of celestial spectrum data. For characteristic line of celestial spectrum, a soft discretization algorithm is presented by using improved fuzzy C-means clustering. Firstly, candidate fuzzy clustering centers of characteristic line are chosen by using density values of sample data, so that its anti-noise ability is improved. Secondly, parameters in the fuzzy clustering are dynamically adjusted by taking compatibility of decision table as criteria, so that optimal discretization effect of the characteristic line is achieved. In the end, experimental results effectively validate that the algorithm has higher correct recognition rate of the algorithm by using three SDSS celestial spectrum data sets of high-redshift quasars, late-type star and quasars.
Classification of FTIR cancer data using wavelets and fuzzy C-means clustering
NASA Astrophysics Data System (ADS)
Bai, Li; Liu, Yihui
2005-11-01
A feature extracting method based on wavelets for Fourier Transform Infrared (FTIR) cancer data analysis is presented in this paper. A set of low frequency wavelet basis is used to represent FTIR data to reduce data dimension and remove noise. The fuzzy C-means algorithm is used to classify the data. Experiments are conducted to compare classification performance using wavelet features and the original FTIR data provided by the Derby City General Hospital in the UK. Experiments show that only 30 wavelet features are needed to represent 901 wave numbers of the FTIR data to produce good clustering results.
Elazab, Ahmed; AbdulAzeem, Yousry M; Wu, Shiqian; Hu, Qingmao
2016-03-17
Brain tissue segmentation from magnetic resonance (MR) images is an importance task for clinical use. The segmentation process becomes more challenging in the presence of noise, grayscale inhomogeneity, and other image artifacts. In this paper, we propose a robust kernelized local information fuzzy C-means clustering algorithm (RKLIFCM). It incorporates local information into the segmentation process (both grayscale and spatial) for more homogeneous segmentation. In addition, the Gaussian radial basis kernel function is adopted as a distance metric to replace the standard Euclidean distance. The main advantages of the new algorithm are: efficient utilization of local grayscale and spatial information, robustness to noise, ability to preserve image details, free from any parameter initialization, and with high speed as it runs on image histogram. We compared the proposed algorithm with 7 soft clustering algorithms that run on both image histogram and image pixels to segment brain MR images. Experimental results demonstrate that the proposed RKLIFCM algorithm is able to overcome the influence of noise and achieve higher segmentation accuracy with low computational complexity. PMID:27257884
Elazab, Ahmed; Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin; Hu, Qingmao
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity. PMID:26793269
Elazab, Ahmed; Wang, Changmiao; Jia, Fucang; Wu, Jianhuang; Li, Guanglin; Hu, Qingmao
2015-01-01
An adaptively regularized kernel-based fuzzy C-means clustering framework is proposed for segmentation of brain magnetic resonance images. The framework can be in the form of three algorithms for the local average grayscale being replaced by the grayscale of the average filter, median filter, and devised weighted images, respectively. The algorithms employ the heterogeneity of grayscales in the neighborhood and exploit this measure for local contextual information and replace the standard Euclidean distance with Gaussian radial basis kernel functions. The main advantages are adaptiveness to local context, enhanced robustness to preserve image details, independence of clustering parameters, and decreased computational costs. The algorithms have been validated against both synthetic and clinical magnetic resonance images with different types and levels of noises and compared with 6 recent soft clustering algorithms. Experimental results show that the proposed algorithms are superior in preserving image details and segmentation accuracy while maintaining a low computational complexity.
Parastar, Hadi; Bazrafshan, Alisina
2016-03-18
Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms.
SPEQTACLE: An automated generalized fuzzy C-means algorithm for tumor delineation in PET
Lapuyade-Lahorgue, Jérôme; Visvikis, Dimitris; Hatt, Mathieu; Pradier, Olivier; Cheze Le Rest, Catherine
2015-10-15
Purpose: Accurate tumor delineation in positron emission tomography (PET) images is crucial in oncology. Although recent methods achieved good results, there is still room for improvement regarding tumors with complex shapes, low signal-to-noise ratio, and high levels of uptake heterogeneity. Methods: The authors developed and evaluated an original clustering-based method called spatial positron emission quantification of tumor—Automatic Lp-norm estimation (SPEQTACLE), based on the fuzzy C-means (FCM) algorithm with a generalization exploiting a Hilbertian norm to more accurately account for the fuzzy and non-Gaussian distributions of PET images. An automatic and reproducible estimation scheme of the norm on an image-by-image basis was developed. Robustness was assessed by studying the consistency of results obtained on multiple acquisitions of the NEMA phantom on three different scanners with varying acquisition parameters. Accuracy was evaluated using classification errors (CEs) on simulated and clinical images. SPEQTACLE was compared to another FCM implementation, fuzzy local information C-means (FLICM) and fuzzy locally adaptive Bayesian (FLAB). Results: SPEQTACLE demonstrated a level of robustness similar to FLAB (variability of 14% ± 9% vs 14% ± 7%, p = 0.15) and higher than FLICM (45% ± 18%, p < 0.0001), and improved accuracy with lower CE (14% ± 11%) over both FLICM (29% ± 29%) and FLAB (22% ± 20%) on simulated images. Improvement was significant for the more challenging cases with CE of 17% ± 11% for SPEQTACLE vs 28% ± 22% for FLAB (p = 0.009) and 40% ± 35% for FLICM (p < 0.0001). For the clinical cases, SPEQTACLE outperformed FLAB and FLICM (15% ± 6% vs 37% ± 14% and 30% ± 17%, p < 0.004). Conclusions: SPEQTACLE benefitted from the fully automatic estimation of the norm on a case-by-case basis. This promising approach will be extended to multimodal images and multiclass estimation in future developments.
Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering
2012-01-01
Background Understanding how neurons contribute to perception, motor functions and cognition requires the reliable detection of spiking activity of individual neurons during a number of different experimental conditions. An important problem in computational neuroscience is thus to develop algorithms to automatically detect and sort the spiking activity of individual neurons from extracellular recordings. While many algorithms for spike sorting exist, the problem of accurate and fast online sorting still remains a challenging issue. Results Here we present a novel software tool, called FSPS (Fuzzy SPike Sorting), which is designed to optimize: (i) fast and accurate detection, (ii) offline sorting and (iii) online classification of neuronal spikes with very limited or null human intervention. The method is based on a combination of Singular Value Decomposition for fast and highly accurate pre-processing of spike shapes, unsupervised Fuzzy C-mean, high-resolution alignment of extracted spike waveforms, optimal selection of the number of features to retain, automatic identification the number of clusters, and quantitative quality assessment of resulting clusters independent on their size. After being trained on a short testing data stream, the method can reliably perform supervised online classification and monitoring of single neuron activity. The generalized procedure has been implemented in our FSPS spike sorting software (available free for non-commercial academic applications at the address: http://www.spikesorting.com) using LabVIEW (National Instruments, USA). We evaluated the performance of our algorithm both on benchmark simulated datasets with different levels of background noise and on real extracellular recordings from premotor cortex of Macaque monkeys. The results of these tests showed an excellent accuracy in discriminating low-amplitude and overlapping spikes under strong background noise. The performance of our method is competitive with respect to
Tsantis, Stavros; Spiliopoulos, Stavros; Karnabatidis, Dimitrios; Skouroliakou, Aikaterini; Hazle, John D.; Kagadis, George C. E-mail: George.Kagadis@med.upatras.gr
2014-07-15
Purpose: Speckle suppression in ultrasound (US) images of various anatomic structures via a novel speckle noise reduction algorithm. Methods: The proposed algorithm employs an enhanced fuzzy c-means (EFCM) clustering and multiresolution wavelet analysis to distinguish edges from speckle noise in US images. The edge detection procedure involves a coarse-to-fine strategy with spatial and interscale constraints so as to classify wavelet local maxima distribution at different frequency bands. As an outcome, an edge map across scales is derived whereas the wavelet coefficients that correspond to speckle are suppressed in the inverse wavelet transform acquiring the denoised US image. Results: A total of 34 thyroid, liver, and breast US examinations were performed on a Logiq 9 US system. Each of these images was subjected to the proposed EFCM algorithm and, for comparison, to commercial speckle reduction imaging (SRI) software and another well-known denoising approach, Pizurica's method. The quantification of the speckle suppression performance in the selected set of US images was carried out via Speckle Suppression Index (SSI) with results of 0.61, 0.71, and 0.73 for EFCM, SRI, and Pizurica's methods, respectively. Peak signal-to-noise ratios of 35.12, 33.95, and 29.78 and edge preservation indices of 0.94, 0.93, and 0.86 were found for the EFCM, SIR, and Pizurica's method, respectively, demonstrating that the proposed method achieves superior speckle reduction performance and edge preservation properties. Based on two independent radiologists’ qualitative evaluation the proposed method significantly improved image characteristics over standard baseline B mode images, and those processed with the Pizurica's method. Furthermore, it yielded results similar to those for SRI for breast and thyroid images significantly better results than SRI for liver imaging, thus improving diagnostic accuracy in both superficial and in-depth structures. Conclusions: A new wavelet
An incremental clustering algorithm based on Mahalanobis distance
NASA Astrophysics Data System (ADS)
Aik, Lim Eng; Choon, Tan Wee
2014-12-01
Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.
Image watermarking using a dynamically weighted fuzzy c-means algorithm
NASA Astrophysics Data System (ADS)
Kang, Myeongsu; Ho, Linh Tran; Kim, Yongmin; Kim, Cheol Hong; Kim, Jong-Myon
2011-10-01
Digital watermarking has received extensive attention as a new method of protecting multimedia content from unauthorized copying. In this paper, we present a nonblind watermarking system using a proposed dynamically weighted fuzzy c-means (DWFCM) technique combined with discrete wavelet transform (DWT), discrete cosine transform (DCT), and singular value decomposition (SVD) techniques for copyright protection. The proposed scheme efficiently selects blocks in which the watermark is embedded using new membership values of DWFCM as the embedding strength. We evaluated the proposed algorithm in terms of robustness against various watermarking attacks and imperceptibility compared to other algorithms [DWT-DCT-based and DCT- fuzzy c-means (FCM)-based algorithms]. Experimental results indicate that the proposed algorithm outperforms other algorithms in terms of robustness against several types of attacks, such as noise addition (Gaussian noise, salt and pepper noise), rotation, Gaussian low-pass filtering, mean filtering, median filtering, Gaussian blur, image sharpening, histogram equalization, and JPEG compression. In addition, the proposed algorithm achieves higher values of peak signal-to-noise ratio (approximately 49 dB) and lower values of measure-singular value decomposition (5.8 to 6.6) than other algorithms.
Segmentation of pomegranate MR images using spatial fuzzy c-means (SFCM) algorithm
NASA Astrophysics Data System (ADS)
Moradi, Ghobad; Shamsi, Mousa; Sedaaghi, M. H.; Alsharif, M. R.
2011-10-01
Segmentation is one of the fundamental issues of image processing and machine vision. It plays a prominent role in a variety of image processing applications. In this paper, one of the most important applications of image processing in MRI segmentation of pomegranate is explored. Pomegranate is a fruit with pharmacological properties such as being anti-viral and anti-cancer. Having a high quality product in hand would be critical factor in its marketing. The internal quality of the product is comprehensively important in the sorting process. The determination of qualitative features cannot be manually made. Therefore, the segmentation of the internal structures of the fruit needs to be performed as accurately as possible in presence of noise. Fuzzy c-means (FCM) algorithm is noise-sensitive and pixels with noise are classified inversely. As a solution, in this paper, the spatial FCM algorithm in pomegranate MR images' segmentation is proposed. The algorithm is performed with setting the spatial neighborhood information in FCM and modification of fuzzy membership function for each class. The segmentation algorithm results on the original and the corrupted Pomegranate MR images by Gaussian, Salt Pepper and Speckle noises show that the SFCM algorithm operates much more significantly than FCM algorithm. Also, after diverse steps of qualitative and quantitative analysis, we have concluded that the SFCM algorithm with 5×5 window size is better than the other windows.
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
NASA Astrophysics Data System (ADS)
Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.
2013-10-01
Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.
T1- and T2-weighted spatially constrained fuzzy c-means clustering for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Despotović, Ivana; Goossens, Bart; Vansteenkiste, Ewout; Philips, Wilfried
2010-03-01
The segmentation of brain tissue in magnetic resonance imaging (MRI) plays an important role in clinical analysis and is useful for many applications including studying brain diseases, surgical planning and computer assisted diagnoses. In general, accurate tissue segmentation is a difficult task, not only because of the complicated structure of the brain and the anatomical variability between subjects, but also because of the presence of noise and low tissue contrasts in the MRI images, especially in neonatal brain images. Fuzzy clustering techniques have been widely used in automated image segmentation. However, since the standard fuzzy c-means (FCM) clustering algorithm does not consider any spatial information, it is highly sensitive to noise. In this paper, we present an extension of the FCM algorithm to overcome this drawback, by combining information from both T1-weighted (T1-w) and T2-weighted (T2-w) MRI scans and by incorporating spatial information. This new spatially constrained FCM (SCFCM) clustering algorithm preserves the homogeneity of the regions better than existing FCM techniques, which often have difficulties when tissues have overlapping intensity profiles. The performance of the proposed algorithm is tested on simulated and real adult MR brain images with different noise levels, as well as on neonatal MR brain images with the gestational age of 39 weeks. Experimental quantitative and qualitative segmentation results show that the proposed method is effective and more robust to noise than other FCM-based methods. Also, SCFCM appears as a very promising tool for complex and noisy image segmentation of the neonatal brain.
NASA Astrophysics Data System (ADS)
Rapstine, Thomas D.
Gravity gradiometry has been used as a geophysical tool to image salt structure in hydrocarbon exploration. The knowledge of the location, orientation, and spatial extent of salt bodies helps characterize possible petroleum prospects. Imaging around and underneath salt bodies can be challenging given the petrophysical properties and complicated geometry of salt. Methods for imaging beneath salt using seismic data exist but are often iterative and expensive, requiring a refinement of a velocity model at each iteration. Fortunately, the relatively strong density contrast between salt and background density structure pro- vides the opportunity for gravity gradiometry to be useful in exploration, especially when integrated with other geophysical data such as seismic. Quantitatively integrating multiple geophysical data is not trivial, but can improve the recovery of salt body geometry and petrophysical composition using inversion. This thesis provides two options for quantitatively integrating seismic, AGG, and petrophysical data that may aid the imaging of salt bodies. Both methods leverage and expand upon previously developed deterministic inversion methods. The inversion methods leverage seismically derived information, such as horizon slope and salt body interpretation, to constrain the inversion of airborne gravity gradiometry data (AGG) to arrive at a density contrast model. The first method involves constraining a top of salt inversion using slope in a seismic image. The second method expands fuzzy c-means (FCM) clustering inversion to include spatial control on clustering based on a seismically derived salt body interpretation. The effective- ness of the methods are illustrated on a 2D synthetic earth model derived from the SEAM Phase 1 salt model. Both methods show that constraining the inversion of AGG data using information derived from seismic images can improve the recovery of salt.
NASA Astrophysics Data System (ADS)
Polat, Kemal
2012-04-01
This study presents the application of fuzzy c-means (FCM) clustering-based feature weighting (FCMFW) for the detection of Parkinson's disease (PD). In the classification of PD dataset taken from University of California - Irvine machine learning database, practical values of the existing traditional and non-standard measures for distinguishing healthy people from people with PD by detecting dysphonia were applied to the input of FCMFW. The main aims of FCM clustering algorithm are both to transform from a linearly non-separable dataset to a linearly separable one and to increase the distinguishing performance between classes. The weighted PD dataset is presented to k-nearest neighbour (k-NN) classifier system. In the classification of PD, the various k-values in k-NN classifier were used and compared with each other. Also, the effects of k-values in k-NN classifier on the classification of Parkinson disease datasets have been investigated and the best k-value found. The experimental results have demonstrated that the combination of the proposed weighting method called FCMFW and k-NN classifier has obtained very promising results on the classification of PD.
NASA Astrophysics Data System (ADS)
Nasseri, Aynur; Jafar Mohammadzadeh, Mohammad; Hashem Tabatabaei Raeisi, S.
2015-04-01
This paper deals with the application of the ant colony algorithm (AC) to a seismic dataset from Dezful Embayment in the southwest region of Iran. The objective of the approach is to generate an accurate representation of faults and discontinuities to assist in pertinent matters such as well planning and field optimization. The AC analyzed all spatial discontinuities in the seismic attributes from which features were extracted. True fault information from the attributes was detected by many artificial ants, whereas noise and the remains of the reflectors were eliminated. Furthermore, the fracture enhancement procedure was conducted by three steps on seismic data of the area. In the first step several attributes such as chaos, variance/coherence and dip deviation were taken into account; the resulting maps indicate high-resolution contrast for the variance attribute. Subsequently, the enhancement of spatial discontinuities was performed and finally elimination of the noise and remains of non-faulting events was carried out by simulating the behavior of ant colonies. After considering stepwise attribute optimization, focusing on chaos and variance in particular, an attribute fusion was generated and used in the ant colony algorithm. The resulting map displayed the highest performance in feature detection along the main structural feature trend, confined to a NW-SE direction. Thus, the optimized attribute fusion might be used with greater confidence to map the structural feature network with more accuracy and resolution. In order to assess the performance of the AC in feature detection, and cross validate the reliability of the method used, fuzzy c-means clustering (FCMC) was employed for the same dataset. Comparing the maps illustrates the effectiveness and preference of the AC approach due to its high resolution contrast for structural feature detection compared to the FCMC method. Accordingly, 3D planes of discontinuity determined spatial distribution of fractures
Khan, Javed; Malik, Aamir Saeed; Kamel, Nidal; Dass, Sarat Chandra; Affandi, Azura Mohd
2015-01-01
Segmentation is the basic and important step for digital image analysis and understanding. Segmentation of acne lesions in the visual spectrum of light is very challenging due to factors such as varying skin tones due to ethnicity, camera calibration and the lighting conditions. In this approach the color image is transformed into various color spaces. The image is decomposed into the specified number of homogeneous regions based on the similarity of color using fuzzy C-means clustering technique. Features are extracted for each cluster and average values of these features are calculated. A new objective function is defined that selects the cluster holding the lesion pixels based on the average value of cluster features. In this study segmentation results are generated in four color spaces (RGB, rgb, YIQ, I1I2I3) and two individual color components (I3, Q). The number of clusters is varied from 2 to 6. The experiment was carried out on fifty images of acne patients. The performance of the proposed technique is measured in terms of the three mostly used metrics; sensitivity, specificity, and accuracy. Best results were obtained for Q and I3 color components of YIQ and I1I2I3 color spaces with the number of clusters equal to three. These color components show robustness against non-uniform illumination and maximize the gap between the lesion and skin color.
Basic cluster compression algorithm
NASA Technical Reports Server (NTRS)
Hilbert, E. E.; Lee, J.
1980-01-01
Feature extraction and data compression of LANDSAT data is accomplished by BCCA program which reduces costs associated with transmitting, storing, distributing, and interpreting multispectral image data. Algorithm uses spatially local clustering to extract features from image data to describe spectral characteristics of data set. Approach requires only simple repetitive computations, and parallel processing can be used for very high data rates. Program is written in FORTRAN IV for batch execution and has been implemented on SEL 32/55.
NASA Astrophysics Data System (ADS)
Ghaffarian, Saman; Gökaşar, Ilgın
2016-01-01
This study presents an approach for the automatic detection of vehicles using very high-resolution images and road vector data. Initially, road vector data and aerial images are integrated to extract road regions. Then, the extracted road/street region is clustered using an automatic histogram-based fuzzy C-means algorithm, and edge pixels are detected using the Canny edge detector. In order to automatically detect vehicles, we developed a local perceptual grouping approach based on fusion of edge detection and clustering outputs. To provide the locality, an ellipse is generated using characteristics of the candidate clusters individually. Then, ratio of edge pixels to nonedge pixels in the corresponding ellipse is computed to distinguish the vehicles. Finally, a point-merging rule is conducted to merge the points that satisfy a predefined threshold and are supposed to denote the same vehicles. The experimental validation of the proposed method was carried out on six very high-resolution aerial images that illustrate two highways, two shadowed roads, a crowded narrow street, and a street in a dense urban area with crowded parked vehicles. The evaluation of the results shows that our proposed method performed 86% and 83% in overall correctness and completeness, respectively.
NASA Astrophysics Data System (ADS)
An, Yu; Liu, Jie; Ye, Jinzuo; Mao, Yamin; Yang, Xin; Jiang, Shixin; Chi, Chongwei; Tian, Jie
2015-03-01
As an important molecular imaging modality, fluorescence molecular imaging (FMI) has the advantages of high sensitivity, low cost and ease of use. By labeling the regions of interest with fluorophore, FMI can noninvasively obtain the distribution of fluorophore in-vivo. However, due to the fact that the spectrum of fluorescence is in the section of the visible light range, there are mass of autofluorescence on the surface of the bio-tissues, which is a major disturbing factor in FMI. Meanwhile, the high-level of dark current for charge-coupled device (CCD) camera and other influencing factor can also produce a lot of background noise. In this paper, a novel method for image denoising of FMI based on fuzzy C-Means clustering (FCM) is proposed, because the fluorescent signal is the major component of the fluorescence images, and the intensity of autofluorescence and other background signals is relatively lower than the fluorescence signal. First, the fluorescence image is smoothed by sliding-neighborhood operations to initially eliminate the noise. Then, the wavelet transform (WLT) is performed on the fluorescence images to obtain the major component of the fluorescent signals. After that, the FCM method is adopt to separate the major component and background of the fluorescence images. Finally, the proposed method was validated using the original data obtained by in vivo implanted fluorophore experiment, and the results show that our proposed method can effectively obtain the fluorescence signal while eliminate the background noise, which could increase the quality of fluorescence images.
Keller, Brad M.; Nathan, Diane L.; Wang Yan; Zheng Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina
2012-08-15
Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') and vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then
NASA Astrophysics Data System (ADS)
Moulik, P.; Tiampo, K. F.
2009-05-01
The information on three-dimensional geometry as well as the identification of active fault segments is critical to our assessment of seismic risks. Numerical modeling of the aftershock locations, times and magnitudes are also crucial to characterize a fault zone. In this study, a pattern recognition technique based on the Fuzzy C- means clustering algorithm (Bezdek, 1981) is proposed to allow each earthquake to be associated with different fault segments. The spatial covariance tensor for each cluster and the associated earthquakes are used to find optimal anisotropic clusters and designate them as faults, similar to the OADC method (Ouillon et al., 2008). The location, size and orientation of the reconstructed faults segments are characterized using a fuzzy covariance matrix (Gustafson and Kessel, 1978). The output consists of a set of distinct fault segments along with the associated earthquakes at different fuzzy membership grades (Zadeh, 1965). A resultant matrix consists of the fuzzy membership grade for different earthquakes and corresponding faults segments specifying their degree of association with values from zero to one. The spatial distribution of earthquakes of different magnitudes and membership grades for a fault segment is incorporated in an anisotropic spatial kernel which characterizes the aftershock density at a distance vector in the ETAS model (Kagan and Knopoff, 1987; Ogata, 1988). An optimal spatio-temporal distribution of aftershocks is obtained for each fault segment without considering a priori distributions such as Gaussian or power law (Helmstetter et al., 2006; Helmstetter and Sornette, 2002). The model is tested on the aftershock sequence from the Denali, 2002 earthquake in Alaska and the fault reconstruction results compared with the known faults in the area. Therefore, a new method to incorporate the anisotropic nature of aftershock diffusion along with the reconstruction of fault networks from seismicity catalogs is formulated in
NASA Astrophysics Data System (ADS)
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-01
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Rosen, C; Yuan, Z
2001-01-01
In this paper a methodology for integrated multivariate monitoring and control of biological wastewater treatment plants during extreme events is presented. To monitor the process, on-line dynamic principal component analysis (PCA) is performed on the process data to extract the principal components that represent the underlying mechanisms of the process. Fuzzy o-means (FCM) clustering is used to classify the operational state. Performing clustering on scores from PCA solves computational problems as well as increases robustness due to noise attenuation. The class-membership information from FCM is used to derive adequate control set points for the local control loops. The methodology is illustrated by a simulation study of a biological wastewater treatment plant, on which disturbances of various types are imposed. The results show that the methodology can be used to determine and co-ordinate control actions in order to shift the control objective and improve the effluent quality.
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index.
NASA Astrophysics Data System (ADS)
Sadr, Ali; Momtaz, Amirkeyvan
2012-01-01
Clustering is one of the image-processing methods used in non-destructive testing (NDT). As one of the initializing parameters, most clustering algorithms, like fuzzy C means (FCM), Iterative self-organization data analysis (ISODATA), K-means, and their derivatives, require the number of clusters. This paper proposes an algorithm for clustering the pixels in C-scan images without any initializing parameters. In this state-of-the-art method, an image is sampled based on the rosette pattern and according to the pattern characteristics, and extracted samples are clustered and then the number of clusters is determined. The centroids of the classes are computed by means of a method used to calculate the distribution function. Based on different data sets, the results show that the algorithm improves the clustering capability by 92.93% and 91.93% in comparison with FCM and K-means algorithms, respectively. Moreover, when dealing with high-resolution data sets, the efficiency of the algorithm in terms of cluster detection and run time improves considerably.
HAMEDIAN, Amir Abbas; JAVID, Allahbakhsh; MOTESADDI ZARANDI, Saeed; RASHIDI, Yousef; MAJLESI, Monireh
2016-01-01
Background: Since the industrial revolution, the rate of industrialization and urbanization has increased dramatically. Regarding this issue, specific regions mostly located in developing countries have been confronted with serious problems, particularly environmental problems among which air pollution is of high importance. Methods: Eleven parameters, including CO, SO2, PM10, PM2.5, O3, NO2, benzene, toluene, ethyl-benzene, xylene, and 1,3-butadiene, have been accounted over a period of two years (2011–2012) from five monitoring stations located at Tehran, Iran, were assessed by using fuzzy inference system and fuzzy c-mean clustering. Results: These tools showed that the quality of criteria pollutants between the year 2011 and 2012 did not as much effect the public health as the other pollutants did. Conclusion: Using the air EPA AQI, the quality of air, and also the managerial plans required to improve the quality can be misled. PMID:27516999
Hesitant fuzzy agglomerative hierarchical clustering algorithms
NASA Astrophysics Data System (ADS)
Zhang, Xiaolu; Xu, Zeshui
2015-02-01
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.
NASA Astrophysics Data System (ADS)
Castro, Marcelo A.; Thomasson, David; Avila, Nilo A.; Hufton, Jennifer; Senseney, Justin; Johnson, Reed F.; Dyall, Julie
2013-03-01
Monkeypox virus is an emerging zoonotic pathogen that results in up to 10% mortality in humans. Knowledge of clinical manifestations and temporal progression of monkeypox disease is limited to data collected from rare outbreaks in remote regions of Central and West Africa. Clinical observations show that monkeypox infection resembles variola infection. Given the limited capability to study monkeypox disease in humans, characterization of the disease in animal models is required. A previous work focused on the identification of inflammatory patterns using PET/CT image modality in two non-human primates previously inoculated with the virus. In this work we extended techniques used in computer-aided detection of lung tumors to identify inflammatory lesions from monkeypox virus infection and their progression using CT images. Accurate estimation of partial volumes of lung lesions via segmentation is difficult because of poor discrimination between blood vessels, diseased regions, and outer structures. We used hard C-means algorithm in conjunction with landmark based registration to estimate the extent of monkeypox virus induced disease before inoculation and after disease progression. Automated estimation is in close agreement with manual segmentation.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.; Cao, Yue
2014-01-15
Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. A DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Parallel Clustering Algorithms for Structured AMR
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.
Noise-enhanced clustering and competitive learning algorithms.
Osoba, Osonde; Kosko, Bart
2013-01-01
Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Efficient Fuzzy C-Means Architecture for Image Segmentation
Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen
2011-01-01
This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate. PMID:22163980
Accelerating Fuzzy-C Means Using an Estimated Subsample Size
Parker, Jonathon K.; Hall, Lawrence O.
2015-01-01
Many algorithms designed to accelerate the Fuzzy c-Means (FCM) clustering algorithm randomly sample the data. Typically, no statistical method is used to estimate the subsample size, despite the impact subsample sizes have on speed and quality. This paper introduces two new accelerated algorithms, GOFCM and MSERFCM, that use a statistical method to estimate the subsample size. GOFCM, a variant of SPFCM, also leverages progressive sampling. MSERFCM, a variant of rseFCM, gains a speedup from improved initialization. A general, novel stopping criterion for accelerated clustering is introduced. The new algorithms are compared to FCM and four accelerated variants of FCM. GOFCM's speedup was 4-47 times that of FCM and faster than SPFCM on each of the six datasets used in experiments. For five of the datasets, partitions were within 1% of those of FCM. MSERFCM's speedup was 5-26 times that of FCM and produced partitions within 3% of those of FCM on all datasets. A unique dataset, consisting of plankton images, exposed the strengths and weaknesses of many of the algorithms tested. It is shown that the new stopping criterion is effective in speeding up algorithms such as SPFCM and the final partitions are very close to those of FCM. PMID:26617455
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R.; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
Improved Ant Colony Clustering Algorithm and Its Performance Study.
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
NASA Astrophysics Data System (ADS)
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is
A novel clustering algorithm inspired by membrane computing.
Peng, Hong; Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.
A Novel Clustering Algorithm Inspired by Membrane Computing
Luo, Xiaohui; Gao, Zhisheng; Wang, Jun; Pei, Zheng
2015-01-01
P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature. PMID:25874264
Color sorting algorithm based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Zhang, BaoFeng; Huang, Qian
2009-11-01
In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
A Flocking Based algorithm for Document Clustering Analysis
Cui, Xiaohui; Gao, Jinzhu; Potok, Thomas E
2006-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.
A systematic comparison of genome-scale clustering algorithms
2012-01-01
Background A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further
The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis
NASA Astrophysics Data System (ADS)
Hoshen, Joseph
1997-08-01
In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
A space-time cluster algorithm for stochastic processes.
Gulbahce, N.
2003-01-01
We introduce a space-time cluster algorithm that will generate histories of stochastic processes. Michael Zimmer introduced a spacetime MC algorithm for stochastic classical dynamics and he applied it to simulate Ising model with Glauber dynamics. Following his steps, we extended Brower and Tamayo's embedded {phi}{sup 4} dynamics to space and time. We believe our algorithm can be applied to more general stochastic systems. Why space-time? To be able to study nonequilibrium systems, we need to know the probability of the 'history' of a nonequilibrium state. Histories are the entire space-time configurations. Cluster algorithms first introduced by SW, are useful to overcome critical slowing down. Brower and Tamayo have mapped continous field variables to Ising spins, and have grown and flipped SW clusters to gain speed. Our algorithm is an extended version of theirs to space and time.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
A Geometric Clustering Algorithm with Applications to Structural Data
Xu, Shutan; Zou, Shuxue
2015-01-01
Abstract An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
A survey of fuzzy clustering algorithms for pattern recognition. II.
Baraldi, A; Blonda, P
1999-01-01
For pt.I see ibid., p.775-85. In part I an equivalence between the concepts of fuzzy clustering and soft competitive learning in clustering algorithms is proposed on the basis of the existing literature. Moreover, a set of functional attributes is selected for use as dictionary entries in the comparison of clustering algorithms. In this paper, five clustering algorithms taken from the literature are reviewed, assessed and compared on the basis of the selected properties of interest. These clustering models are (1) self-organizing map (SOM); (2) fuzzy learning vector quantization (FLVQ); (3) fuzzy adaptive resonance theory (fuzzy ART); (4) growing neural gas (GNG); (5) fully self-organizing simplified adaptive resonance theory (FOSART). Although our theoretical comparison is fairly simple, it yields observations that may appear parodoxical. First, only FLVQ, fuzzy ART, and FOSART exploit concepts derived from fuzzy set theory (e.g., relative and/or absolute fuzzy membership functions). Secondly, only SOM, FLVQ, GNG, and FOSART employ soft competitive learning mechanisms, which are affected by asymptotic misbehaviors in the case of FLVQ, i.e., only SOM, GNG, and FOSART are considered effective fuzzy clustering algorithms. PMID:18252358
A Novel Complex Networks Clustering Algorithm Based on the Core Influence of Nodes
Dai, Bin; Xie, Zhongyu
2014-01-01
In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely. PMID:24741359
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large
A new fuzzy C-means method for magnetic resonance image brain segmentation
NASA Astrophysics Data System (ADS)
Altameem, Torki; Zanaty, E. A.; Tolba, Amr
2015-10-01
In this paper, we introduce a new fuzzy c-means (FCM) method in order to improve the magnetic resonance images' (MRIs) segmentation. The proposed method combines the FCM and possiblistic c-means (PCM) functions using a weighted Gaussian function. The weighted Gaussian function is given to indicate the spatial influence of the neighbouring pixels on the central pixel. The parameters of weighting coefficients are automatically determined in the implementation using the Gaussian function for every pixel in the image. The proposed method is realised by modifying the objective function of the PCM algorithm to produce memberships and possibilities simultaneously, along with the usual point prototypes or cluster centres for each cluster. The membership values can be interpreted as degrees of possibility of the points belonging to the classes, that is, the compatibilities of the points with the class prototypes to overcome the coincident clusters problem of PCM. The efficiency of the proposed algorithm is demonstrated by extensive segmentation experiments using MRIs and comparison with other state-of-the-art algorithms. In the proposed method, the effect of noise is controlled by incorporating the possibility (typicality) function in addition to the membership function. Consideration of these constraints can greatly control the noise in the image as shown in our experiments.
A Task-parallel Clustering Algorithm for Structured AMR
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
Open cluster membership probability based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
El Aziz, Mohamed Abd; Selim, I. M.; Essam, A.
2016-08-01
In the field of galaxies images, the relative coordinate positions of each star with respect to all the other stars are adapted. Therefore the membership of star cluster will be adapted by two basic criterions, one for geometric membership and other for physical (photometric) membership. So in this paper, we presented a new method for the determination of open cluster membership based on K-means clustering algorithm. This algorithm allows us to efficiently discriminate the cluster membership from the field stars. To validate the method we applied it on NGC 188 and NGC 2266, membership stars in these clusters have been obtained. The color-magnitude diagram of the membership stars is significantly clearer and shows a well-defined main sequence and a red giant branch in NGC 188, which allows us to better constrain the cluster members and estimate their physical parameters. The membership probabilities have been calculated and compared to those obtained by the other methods. The results show that the K-means clustering algorithm can effectively select probable member stars in space without any assumption about the spatial distribution of stars in cluster or field. The similarity of our results is in a good agreement with results derived by previous works.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering
Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang
2012-01-01
Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
Vetoed jet clustering: the mass-jump algorithm
NASA Astrophysics Data System (ADS)
Stoll, Martin
2015-04-01
A new class of jet clustering algorithms is introduced. A criterion inspired by successful mass-drop taggers is applied that prevents the recombination of two hard prongs if their combined jet mass is substantially larger than the masses of the separate prongs. This "mass jump" veto effectively results in jets with variable radii in dense environments. Differences to existing methods are investigated. It is shown for boosted top quarks that the new algorithm has beneficial properties which can lead to improved tagging purity.
Areibi, Shawki; Yang, Zhen
2004-01-01
Combining global and local search is a strategy used by many successful hybrid optimization approaches. Memetic Algorithms (MAs) are Evolutionary Algorithms (EAs) that apply some sort of local search to further improve the fitness of individuals in the population. Memetic Algorithms have been shown to be very effective in solving many hard combinatorial optimization problems. This paper provides a forum for identifying and exploring the key issues that affect the design and application of Memetic Algorithms. The approach combines a hierarchical design technique, Genetic Algorithms, constructive techniques and advanced local search to solve VLSI circuit layout in the form of circuit partitioning and placement. Results obtained indicate that Memetic Algorithms based on local search, clustering and good initial solutions improve solution quality on average by 35% for the VLSI circuit partitioning problem and 54% for the VLSI standard cell placement problem. PMID:15355604
Areibi, Shawki; Yang, Zhen
2004-01-01
Combining global and local search is a strategy used by many successful hybrid optimization approaches. Memetic Algorithms (MAs) are Evolutionary Algorithms (EAs) that apply some sort of local search to further improve the fitness of individuals in the population. Memetic Algorithms have been shown to be very effective in solving many hard combinatorial optimization problems. This paper provides a forum for identifying and exploring the key issues that affect the design and application of Memetic Algorithms. The approach combines a hierarchical design technique, Genetic Algorithms, constructive techniques and advanced local search to solve VLSI circuit layout in the form of circuit partitioning and placement. Results obtained indicate that Memetic Algorithms based on local search, clustering and good initial solutions improve solution quality on average by 35% for the VLSI circuit partitioning problem and 54% for the VLSI standard cell placement problem.
Mapping cultivable land from satellite imagery with clustering algorithms
NASA Astrophysics Data System (ADS)
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
ICANP2: Isoenergetic cluster algorithm for NP-complete Problems
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Fang, Chao; Katzgraber, Helmut G.
NP-complete optimization problems with Boolean variables are of fundamental importance in computer science, mathematics and physics. Most notably, the minimization of general spin-glass-like Hamiltonians remains a difficult numerical task. There has been a great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized cluster update that can be applied to different NP-complete optimization problems with Boolean variables. The cluster updates allow for a wide-spread sampling of phase space, thus speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle problems on topologies with a large site-percolation threshold. We illustrate the ICANP2 heuristic on paradigmatic optimization problems, such as the satisfiability problem and the vertex cover problem.
A decentralized fuzzy C-means-based energy-efficient routing protocol for wireless sensor networks.
Alia, Osama Moh'd
2014-01-01
Energy conservation in wireless sensor networks (WSNs) is a vital consideration when designing wireless networking protocols. In this paper, we propose a Decentralized Fuzzy Clustering Protocol, named DCFP, which minimizes total network energy dissipation to promote maximum network lifetime. The process of constructing the infrastructure for a given WSN is performed only once at the beginning of the protocol at a base station, which remains unchanged throughout the network's lifetime. In this initial construction step, a fuzzy C-means algorithm is adopted to allocate sensor nodes into their most appropriate clusters. Subsequently, the protocol runs its rounds where each round is divided into a CH-Election phase and a Data Transmission phase. In the CH-Election phase, the election of new cluster heads is done locally in each cluster where a new multicriteria objective function is proposed to enhance the quality of elected cluster heads. In the Data Transmission phase, the sensing and data transmission from each sensor node to their respective cluster head is performed and cluster heads in turn aggregate and send the sensed data to the base station. Simulation results demonstrate that the proposed protocol improves network lifetime, data delivery, and energy consumption compared to other well-known energy-efficient protocols. PMID:25162060
NIC-based Reduction Algorithms for Large-scale Clusters
Petrini, F; Moody, A T; Fernandez, J; Frachtenberg, E; Panda, D K
2004-07-30
Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale--1812 processes--NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation.
Robust growing neural gas algorithm with application in cluster analysis.
Qin, A K; Suganthan, P N
2004-01-01
We propose a novel robust clustering algorithm within the Growing Neural Gas (GNG) framework, called Robust Growing Neural Gas (RGNG) network.The Matlab codes are available from . By incorporating several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates and cluster repulsion method into the traditional GNG framework, the proposed RGNG network possesses better robustness properties. The RGNG is insensitive to initialization, input sequence ordering and the presence of outliers. Furthermore, the RGNG network can automatically determine the optimal number of clusters by seeking the extreme value of the Minimum Description Length (MDL) measure during network growing process. The resulting center positions of the optimal number of clusters represented by prototype vectors are close to the actual ones irrespective of the existence of outliers. Topology relationships among these prototypes can also be established. Experimental results have shown the superior performance of our proposed method over the original GNG incorporating MDL method, called GNG-M, in static data clustering tasks on both artificial and UCI data sets. PMID:15555857
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
Sweeney, Timothy E; Chen, Albert C; Gevaert, Olivier
2015-11-19
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
Gravitation field algorithm and its application in gene cluster
2010-01-01
Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA. PMID:20854683
Novel similarity-based clustering algorithm for grouping broadcast news
NASA Astrophysics Data System (ADS)
Ibrahimov, Oktay V.; Sethi, Ishwar K.; Dimitrova, Nevenka
2002-03-01
The goal of the current paper is to introduce a novel clustering algorithm that has been designed for grouping transcribed textual documents obtained out of audio, video segments. Since audio transcripts are normally highly erroneous documents, one of the major challenges at the text processing stage is to reduce the negative impacts of errors gained at the speech recognition stage. Other difficulties come from the nature of conversational speech. In the paper we describe the main difficulties of the spoken documents and suggest an approach restricting their negative effects. In our paper we also present a clustering algorithm that groups transcripts on the base of informative closeness of documents. To carry out such partitioning we give an intuitive definition of informative field of a transcript and use it in our algorithm. To assess informative closeness of the transcripts, we apply Chi-square similarity measure, which is also described in the paper. Our experiments with Chi-square similarity measure showed its robustness and high efficacy. In particular, the performance analysis that have been carried out in regard to Chi-square and three other similarity measures such as Cosine, Dice, and Jaccard showed that Chi-square is more robust to specific features of spoken documents.
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data.
Koskinen, Kaisa; Auvinen, Petri; Björkroth, K Johanna; Hultman, Jenni
2015-08-01
Natural microbial communities have been studied for decades using the 16S rRNA gene as a marker. In recent years, the application of second-generation sequencing technologies has revolutionized our understanding of the structure and function of microbial communities in complex environments. Using these highly parallel techniques, a detailed description of community characteristics are constructed, and even the rare biosphere can be detected. The new approaches carry numerous advantages and lack many features that skewed the results using traditional techniques, but we are still facing serious bias, and the lack of reliable comparability of produced results. Here, we contrasted publicly available amplicon sequence data analysis algorithms by using two different data sets, one with defined clone-based structure, and one with food spoilage community with well-studied communities. We aimed to assess which software and parameters produce results that resemble the benchmark community best, how large differences can be detected between methods, and whether these differences are statistically significant. The results suggest that commonly accepted denoising and clustering methods used in different combinations produce significantly different outcome: clustering method impacts greatly on the number of operational taxonomic units (OTUs) and denoising algorithm influences more on taxonomic affiliations. The magnitude of the OTU number difference was up to 40-fold and the disparity between results seemed highly dependent on the community structure and diversity. Statistically significant differences in taxonomies between methods were seen even at phylum level. However, the application of effective denoising method seemed to even out the differences produced by clustering. PMID:25525895
Dynamic Layered Dual-Cluster Heads Routing Algorithm Based on Krill Herd Optimization in UWSNs
Jiang, Peng; Feng, Yang; Wu, Feng; Yu, Shanen; Xu, Huan
2016-01-01
Aimed at the limited energy of nodes in underwater wireless sensor networks (UWSNs) and the heavy load of cluster heads in clustering routing algorithms, this paper proposes a dynamic layered dual-cluster routing algorithm based on Krill Herd optimization in UWSNs. Cluster size is first decided by the distance between the cluster head nodes and sink node, and a dynamic layered mechanism is established to avoid the repeated selection of the same cluster head nodes. Using Krill Herd optimization algorithm selects the optimal and second optimal cluster heads, and its Lagrange model directs nodes to a high likelihood area. It ultimately realizes the functions of data collection and data transition. The simulation results show that the proposed algorithm can effectively decrease cluster energy consumption, balance the network energy consumption, and prolong the network lifetime. PMID:27589744
Dynamic Layered Dual-Cluster Heads Routing Algorithm Based on Krill Herd Optimization in UWSNs.
Jiang, Peng; Feng, Yang; Wu, Feng; Yu, Shanen; Xu, Huan
2016-01-01
Aimed at the limited energy of nodes in underwater wireless sensor networks (UWSNs) and the heavy load of cluster heads in clustering routing algorithms, this paper proposes a dynamic layered dual-cluster routing algorithm based on Krill Herd optimization in UWSNs. Cluster size is first decided by the distance between the cluster head nodes and sink node, and a dynamic layered mechanism is established to avoid the repeated selection of the same cluster head nodes. Using Krill Herd optimization algorithm selects the optimal and second optimal cluster heads, and its Lagrange model directs nodes to a high likelihood area. It ultimately realizes the functions of data collection and data transition. The simulation results show that the proposed algorithm can effectively decrease cluster energy consumption, balance the network energy consumption, and prolong the network lifetime. PMID:27589744
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
On the impact of dissimilarity measure in k-modes clustering algorithm.
Ng, Michael K; Li, Mark Junjie; Huang, Joshua Zhexue; He, Zengyou
2007-03-01
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach was developed in [4], [12] which allows the use of the k-modes paradigm to obtain a cluster with strong intrasimilarity and to efficiently cluster large categorical data sets. The main aim of this paper is to rigorously derive the updating formula of the k-modes clustering algorithm with the new dissimilarity measure and the convergence of the algorithm under the optimization framework. PMID:17224620
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
NASA Technical Reports Server (NTRS)
Lambeck, P. F.; Rice, D. P.
1976-01-01
Signature extension is intended to increase the space-time range over which a set of training statistics can be used to classify data without significant loss of recognition accuracy. A first cluster matching algorithm MASC (Multiplicative and Additive Signature Correction) was developed at the Environmental Research Institute of Michigan to test the concept of using associations between training and recognition area cluster statistics to define an average signature transformation. A more recent signature extension module CROP-A (Cluster Regression Ordered on Principal Axis) has shown evidence of making significant associations between training and recognition area cluster statistics, with the clusters to be matched being selected automatically by the algorithm.
A fast readout algorithm for Cluster Counting/Timing drift chambers on a FPGA board
NASA Astrophysics Data System (ADS)
Cappelli, L.; Creti, P.; Grancagnolo, F.; Pepino, A.; Tassielli, G.
2013-08-01
A fast readout algorithm for Cluster Counting and Timing purposes has been implemented and tested on a Virtex 6 core FPGA board. The algorithm analyses and stores data coming from a Helium based drift tube instrumented by 1 GSPS fADC and represents the outcome of balancing between cluster identification efficiency and high speed performance. The algorithm can be implemented in electronics boards serving multiple fADC channels as an online preprocessing stage for drift chamber signals.
Parallelization of the Wolff single-cluster algorithm.
Kaupuzs, J; Rimsāns, J; Melnik, R V N
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n> or =2. Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
Parallelization of the Wolff single-cluster algorithm
NASA Astrophysics Data System (ADS)
Kaupužs, J.; Rimšāns, J.; Melnik, R. V. N.
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024 , we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n≥2 . Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
NASA Astrophysics Data System (ADS)
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
A vector reconstruction based clustering algorithm particularly for large-scale text collection.
Liu, Ming; Wu, Chong; Chen, Lei
2015-03-01
Along with the fast evolvement of internet technology, internet users have to face the large amount of textual data every day. Apparently, organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection, which mainly attributes to the high-dimensional vector space and semantic similarity among texts. To effectively and efficiently cluster large-scale text collection, this paper puts forward a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster's representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature's weight is fine-tuned by iterative process similar to self-organizing-mapping (SOM) algorithm. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster's representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high-quality performances on both small-scale and large-scale text collections.
Karayiannis, Nicolaos B; Randolph-Gips, Mary M
2005-03-01
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on a weighted norm to measure the distance between the feature vectors and their prototypes. The development of LVQ and clustering algorithms is based on the minimization of a reformulation function under the constraint that the generalized mean of the norm weights be constant. According to the proposed formulation, the norm weights can be computed from the data in an iterative fashion together with the prototypes. An error analysis provides some guidelines for selecting the parameter involved in the definition of the generalized mean in terms of the feature variances. The algorithms produced from this formulation are easy to implement and they are almost as fast as clustering algorithms relying on the Euclidean norm. An experimental evaluation on four data sets indicates that the proposed algorithms outperform consistently clustering algorithms relying on the Euclidean norm and they are strong competitors to non-Euclidean algorithms which are computationally more demanding.
Clustering performance comparison using K-means and expectation maximization algorithms
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-01-01
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results. PMID:26019610
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease
Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.
2010-01-01
Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
Huang, Xiaohui; Ye, Yunming; Zhang, Haijun
2014-08-01
Kmeans-type clustering aims at partitioning a data set into clusters such that the objects in a cluster are compact and the objects in different clusters are well separated. However, most kmeans-type clustering algorithms rely on only intracluster compactness while overlooking intercluster separation. In this paper, a series of new clustering algorithms by extending the existing kmeans-type algorithms is proposed by integrating both intracluster compactness and intercluster separation. First, a set of new objective functions for clustering is developed. Based on these objective functions, the corresponding updating rules for the algorithms are then derived analytically. The properties and performances of these algorithms are investigated on several synthetic and real-life data sets. Experimental studies demonstrate that our proposed algorithms outperform the state-of-the-art kmeans-type clustering algorithms with respect to four metrics: accuracy, RandIndex, Fscore, and normal mutual information.
Doostparast Torshizi, Abolfazl; Fazel Zarandi, Mohammad Hossein
2015-09-01
This paper considers microarray gene expression data clustering using a novel two stage meta-heuristic algorithm based on the concept of α-planes in general type-2 fuzzy sets. The main aim of this research is to present a powerful data clustering approach capable of dealing with highly uncertain environments. In this regard, first, a new objective function using α-planes for general type-2 fuzzy c-means clustering algorithm is represented. Then, based on the philosophy of the meta-heuristic optimization framework 'Simulated Annealing', a two stage optimization algorithm is proposed. The first stage of the proposed approach is devoted to the annealing process accompanied by its proposed perturbation mechanisms. After termination of the first stage, its output is inserted to the second stage where it is checked with other possible local optima through a heuristic algorithm. The output of this stage is then re-entered to the first stage until no better solution is obtained. The proposed approach has been evaluated using several synthesized datasets and three microarray gene expression datasets. Extensive experiments demonstrate the capabilities of the proposed approach compared with some of the state-of-the-art techniques in the literature.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Ju, Ying; Zhang, Songming; Ding, Ningxiang; Zeng, Xiangxiang; Zhang, Xingyi
2016-01-01
The field of complex network clustering is gaining considerable attention in recent years. In this study, a multi-objective evolutionary algorithm based on membranes is proposed to solve the network clustering problem. Population are divided into different membrane structures on average. The evolutionary algorithm is carried out in the membrane structures. The population are eliminated by the vector of membranes. In the proposed method, two evaluation objectives termed as Kernel J-means and Ratio Cut are to be minimized. Extensive experimental studies comparison with state-of-the-art algorithms proves that the proposed algorithm is effective and promising. PMID:27670156
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A new clustering algorithm for scanning electron microscope images
NASA Astrophysics Data System (ADS)
Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad
2016-04-01
A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.
NASA Astrophysics Data System (ADS)
Gatos, Ilias; Tsantis, Stavros; Skouroliakou, Aikaterini; Theotokas, Ioannis; Zoumpoulis, Pavlos S.; Kagadis, George C.
2015-09-01
The aim of the present study is to determine an optimal elasticity cut-off value for discriminating Healthy from Pathological fibrotic patients by means of Fuzzy C-Means automatic segmentation and maximum participation cluster mean value employment in Shear Wave Elastography (SWE) images. The clinical dataset comprised 32 subjects (16 Healthy and 16 histological or Fibroscan verified Chronic Liver Disease). An experienced Radiologist performed SWE measurement placing a region of interest (ROI) on each subject's right liver lobe providing a SWE image for each patient. Subsequently Fuzzy C-Means clustering was performed on every SWE image utilizing 5 clusters. Mean Stiffness value and pixels number of each cluster were calculated. The mean stiffness value feature of the cluster with maximum pixels number was then fed as input for ROC analysis. The selected Mean Stiffness value feature an Area Under the Curve (AUC) of 0.8633 with Optimum Cut-off value of 7.5 kPa with sensitivity and specificity values of 0.8438 and 0.875 and balanced accuracy of 0.8594. Examiner's classification measurements exhibited sensitivity, specificity and balanced accuracy value of 0.8125 with 7.1 kPa cutoff value. A new promising automatic algorithm was implemented with more objective criteria of defining optimum elasticity cut-off values for discriminating fibrosis stages for SWE. More subjects are needed in order to define if this algorithm is an objective tool to outperform manual ROI selection.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
NASA Astrophysics Data System (ADS)
Wu, Xia; Wu, Genhua
2014-08-01
Geometrical optimization of atomic clusters is performed by a development of adaptive immune optimization algorithm (AIOA) with dynamic lattice searching (DLS) operation (AIOA-DLS method). By a cycle of construction and searching of the dynamic lattice (DL), DLS algorithm rapidly makes the clusters more regular and greatly reduces the potential energy. DLS can thus be used as an operation acting on the new individuals after mutation operation in AIOA to improve the performance of the AIOA. The AIOA-DLS method combines the merit of evolutionary algorithm and idea of dynamic lattice. The performance of the proposed method is investigated in the optimization of Lennard-Jones clusters within 250 atoms and silver clusters described by many-body Gupta potential within 150 atoms. Results reported in the literature are reproduced, and the motif of Ag61 cluster is found to be stacking-fault face-centered cubic, whose energy is lower than that of previously obtained icosahedron.
NASA Astrophysics Data System (ADS)
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
NASA Astrophysics Data System (ADS)
Zhang, Xian-Kun; Tian, Xue; Li, Ya-Nan; Song, Chen
2014-08-01
The label propagation algorithm (LPA) is a graph-based semi-supervised learning algorithm, which can predict the information of unlabeled nodes by a few of labeled nodes. It is a community detection method in the field of complex networks. This algorithm is easy to implement with low complexity and the effect is remarkable. It is widely applied in various fields. However, the randomness of the label propagation leads to the poor robustness of the algorithm, and the classification result is unstable. This paper proposes a LPA based on edge clustering coefficient. The node in the network selects a neighbor node whose edge clustering coefficient is the highest to update the label of node rather than a random neighbor node, so that we can effectively restrain the random spread of the label. The experimental results show that the LPA based on edge clustering coefficient has made improvement in the stability and accuracy of the algorithm.
A fast general-purpose clustering algorithm based on FPGAs for high-throughput data processing
NASA Astrophysics Data System (ADS)
Annovi, A.; Beretta, M.
2010-05-01
We present a fast general-purpose algorithm for high-throughput clustering of data "with a two-dimensional organization". The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data have high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In fact, its core does not specifically rely on the kind of data or detector it is working for, while the second step can and should be tailored for a given application. For example, in case of spatial measurement with silicon pixel detectors, the second step performs center of charge calculation. Applications can thus be foreseen to other detectors and other scientific fields ranging from HEP calorimeters to medical imaging. An additional advantage of this two steps approach is that the typical clustering related calculations (second step) are separated from the combinatorial complications of clustering. This separation simplifies the design of the second step and it enables it to perform sophisticated calculations achieving offline quality in online applications. The algorithm is general purpose in the sense that only minimal assumptions on the kind of clustering to be performed are made.
Learning assignment order of instances for the constrained K-means clustering algorithm.
Hong, Yi; Kwong, Sam
2009-04-01
The sensitivity of the constrained K-means clustering algorithm (Cop-Kmeans) to the assignment order of instances is studied, and a novel assignment order learning method for Cop-Kmeans, termed as clustering Uncertainty-based Assignment order Learning Algorithm (UALA), is proposed in this paper. The main idea of UALA is to rank all instances in the data set according to their clustering uncertainties calculated by using the ensembles of multiple clustering algorithms. Experimental results on several real data sets with artificial instance-level constraints demonstrate that UALA can identify a good assignment order of instances for Cop-Kmeans. In addition, the effects of ensemble sizes on the performance of UALA are analyzed, and the generalization property of Cop-Kmeans is also studied.
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Vimalarani, C.; Subramanian, R.; Sivanandam, S. N.
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
A randomized algorithm for two-cluster partition of a set of vectors
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khandeev, V. I.
2015-02-01
A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
An approximation polynomial-time algorithm for a sequence bi-clustering problem
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods. PMID:25147846
NASA Astrophysics Data System (ADS)
Ball, R. C.; Lee, J. R.
1996-03-01
We prove that a new, irreversible growth algorithm, Non-Deletion Reaction-Limited Cluster-cluster Aggregation (NDRLCA), produces equilibrium Branched Polymers, expected to exhibit Lattice Animal statistics [1]. We implement NDRLCA, off-lattice, as a computer simulation for embedding dimension d=2 and 3, obtaining values for critical exponents, fractal dimension D and cluster mass distribution exponent tau: d=2, D≈ 1.53± 0.05, tau = 1.09± 0.06; d=3, D=1.96± 0.04, tau =1.50± 0.04 in good agreement with theoretical LA values. The simulation results do not support recent suggestions [2] that BPs may be in the same universality class as percolation. We also obtain values for a model-dependent critical “fugacity”, z_c and investigate the finite-size effects of our simulation, quantifying notions of “inbreeding” that occur in this algorithm. Finally we use an extension of the NDRLCA proof to show that standard Reaction-Limited Cluster-cluster Aggregation is very unlikely to be in the same universality class as Branched Polymers/Lattice Animals unless the backnone dimension for the latter is considerably less than the published value.
Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data.
Maji, Pradipta
2011-02-01
One of the major tasks with gene expression data is to find groups of coregulated genes whose collective expression is strongly associated with sample categories. In this regard, a new clustering algorithm, termed as fuzzy-rough supervised attribute clustering (FRSAC), is proposed to find such groups of genes. The proposed algorithm is based on the theory of fuzzy-rough sets, which directly incorporates the information of sample categories into the gene clustering process. A new quantitative measure is introduced based on fuzzy-rough sets that incorporates the information of sample categories to measure the similarity among genes. The proposed algorithm is based on measuring the similarity between genes using the new quantitative measure, whereby redundancy among the genes is removed. The clusters are refined incrementally based on sample categories. The effectiveness of the proposed FRSAC algorithm, along with a comparison with existing supervised and unsupervised gene selection and clustering algorithms, is demonstrated on six cancer and two arthritis data sets based on the class separability index and predictive accuracy of the naive Bayes' classifier, the K-nearest neighbor rule, and the support vector machine. PMID:20542768
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.
Oyana, Tonny J
2010-01-01
The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
Statistical physics based heuristic clustering algorithms with an application to econophysics
NASA Astrophysics Data System (ADS)
Baldwin, Lucia Liliana
Three new approaches to the clustering of data sets are presented. They are heuristic methods and represent forms of unsupervised (non-parametric) clustering. Applied to an unknown set of data these methods automatically determine the number of clusters and their location using no a priori assumptions. All are based on analogies with different physical phenomena. The first technique, named the Percolation Clustering Algorithm, embodies a novel variation on the nearest-neighbor algorithm focusing on the connectivity between sample points. Exploiting the equivalence with a percolation process, this algorithm considers data points to be surrounded by expanding hyperspheres, which bond when they touch each other. Once a sequence of joined spheres spans an entire cluster, percolation occurs and the cluster size remains constant until it merges with a neighboring cluster. The second procedure, named Nucleation and Growth Clustering, exploits the analogy with nucleation and growth which occurs in island formation during epitaxial growth of solids. The original data points are nucleation centers, around which aggregation will occur. Additional "ad-data" that are introduced into the sample space, interact with the data points and stick if located within a threshold distance. These "ad-data" are used as a tool to facilitate the detection of clusters. The third method, named Discrete Deposition Clustering Algorithm, constrains deposition to occur on a grid, which has the advantage of computational efficiency as opposed to the continuous deposition used in the previous method. The original data form the vertexes of a sparse graph and the deposition sites are defined to be the middle points of this graphs edges. Ad-data are introduced on the deposition site and the system is allowed to evolve in a self-organizing regime. This allows the simulation of a phase transition and by monitoring the specific heat capacity of the system one can mark out a "natural" criterion for
An efficient clustering algorithm for partitioning Y-short tandem repeats data
2012-01-01
Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. PMID:23039132
Fuzzy C-means classification for corrosion evolution of steel images
NASA Astrophysics Data System (ADS)
Trujillo, Maite; Sadki, Mustapha
2004-05-01
An unavoidable problem of metal structures is their exposure to rust degradation during their operational life. Thus, the surfaces need to be assessed in order to avoid potential catastrophes. There is considerable interest in the use of patch repair strategies which minimize the project costs. However, to operate such strategies with confidence in the long useful life of the repair, it is essential that the condition of the existing coatings and the steel substrate can be accurately quantified and classified. This paper describes the application of fuzzy set theory for steel surfaces classification according to the steel rust time. We propose a semi-automatic technique to obtain image clustering using the Fuzzy C-means (FCM) algorithm and we analyze two kinds of data to study the classification performance. Firstly, we investigate the use of raw images" pixels without any pre-processing methods and neighborhood pixels. Secondly, we apply Gaussian noise to the images with different standard deviation to study the FCM method tolerance to Gaussian noise. The noisy images simulate the possible perturbations of the images due to the weather or rust deposits in the steel surfaces during typical on-site acquisition procedures
Clustering WHO-ART terms using semantic distance and machine learning algorithms.
Iavindrasana, Jimison; Bousquet, Cedric; Degoulet, Patrice; Jaulent, Marie-Christine
2006-01-01
WHO-ART was developed by the WHO collaborating centre for international drug monitoring in order to code adverse drug reactions. We assume that computation of semantic distance between WHO-ART terms may be an efficient way to group related medical conditions in the WHO database in order to improve signal detection. Our objective was to develop a method for clustering WHO-ART terms according to some proximity of their meanings. Our material comprises 758 WHO-ART terms. A formal definition was acquired for each term as a list of elementary concepts belonging to SNOMED international axes and characterized by modifier terms in some cases. Clustering was implemented as a terminology service on a J2EE server. Two different unsupervised machine learning algorithms (KMeans, Pvclust) clustered WHO-ART terms according to a semantic distance operator previously described. Pvclust grouped 51% of WHO-ART terms. K-Means grouped 100% of WHO-ART terms but 25% clusters were heterogeneous with k = 180 clusters and 6% clusters were heterogeneous with k = 32 clusters. Clustering algorithms associated to semantic distance could suggest potential groupings of WHO-ART terms that need validation according to the user's requirements.
Plot enchaining algorithm: a novel approach for clustering flocks of birds
NASA Astrophysics Data System (ADS)
Büyükaksoy Kaplan, Gülay; Lana, Adnan
2014-06-01
In this study, an intuitive way for tracking flocks of birds is proposed and compared to simple cluster-seeking algorithm for real radar observations. For group of targets such as flock of birds, there is no need to track each target individually. Instead a cluster can be used to represent closely spaced tracks of a possible group. Considering a group of targets as a single target for tracking provides significant performance improvement with almost no loss of information.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-12-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime.
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
The Development of FPGA-Based Pseudo-Iterative Clustering Algorithms
NASA Astrophysics Data System (ADS)
Drueke, Elizabeth; Fisher, Wade; Plucinski, Pawel
2016-03-01
The Large Hadron Collider (LHC) in Geneva, Switzerland, is set to undergo major upgrades in 2025 in the form of the High-Luminosity Large Hadron Collider (HL-LHC). In particular, several hardware upgrades are proposed to the ATLAS detector, one of the two general purpose detectors. These hardware upgrades include, but are not limited to, a new hardware-level clustering algorithm, to be performed by a field programmable gate array, or FPGA. In this study, we develop that clustering algorithm and compare the output to a Python-implemented topoclustering algorithm developed at the University of Oregon. Here, we present the agreement between the FPGA output and expected output, with particular attention to the time required by the FPGA to complete the algorithm and other limitations set by the FPGA itself.
An Efficient Algorithm for Clustering of Large-Scale Mass Spectrometry Data
Saeed, Fahad; Pisitkun, Trairak; Knepper, Mark A.; Hoffert, Jason D.
2012-01-01
High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra. PMID:23471471
An efficient method of key-frame extraction based on a cluster algorithm.
Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng
2013-12-18
This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.
Quantum cluster algorithm for frustrated Ising models in a transverse field
NASA Astrophysics Data System (ADS)
Biswas, Sounak; Rakala, Geet; Damle, Kedar
2016-06-01
Working within the stochastic series expansion framework, we introduce and characterize a plaquette-based quantum cluster algorithm for quantum Monte Carlo simulations of transverse field Ising models with frustrated Ising exchange interactions. As a demonstration of the capabilities of this algorithm, we show that a relatively small ferromagnetic next-nearest-neighbor coupling drives the transverse field Ising antiferromagnet on the triangular lattice from an antiferromagnetic three-sublattice ordered state at low temperature to a ferrimagnetic three-sublattice ordered state.
An algorithm for point cluster generalization based on the Voronoi diagram
NASA Astrophysics Data System (ADS)
Yan, Haowen; Weibel, Robert
2008-08-01
This paper presents an algorithm for point cluster generalization. Four types of information, i.e. statistical, thematic, topological, and metric information are considered, and measures are selected to describe corresponding types of information quantitatively in the algorithm, i.e. the number of points for statistical information, the importance value for thematic information, the Voronoi neighbors for topological information, and the distribution range and relative local density for metric information. Based on these measures, an algorithm for point cluster generalization is developed. Firstly, point clusters are triangulated and a border polygon of the point clusters is obtained. By the border polygon, some pseudo points are added to the original point clusters to form a new point set and a range polygon that encloses all original points is constructed. Secondly, the Voronoi polygons of the new point set are computed in order to obtain the so-called relative local density of each point. Further, the selection probability of each point is computed using its relative local density and importance value, and then mark those will-be-deleted points as 'deleted' according to their selection probabilities and Voronoi neighboring relations. Thirdly, if the number of retained points does not satisfy that computed by the Radical Law, physically delete the points marked as 'deleted' forming a new point set, and the second step is repeated; else physically deleted pseudo points and the points marked as 'deleted', and the generalized point clusters are achieved. Owing to the use of the Voronoi diagram the algorithm is parameter free and fully automatic. As our experiments show, it can be used in the generalization of point features arranged in clusters such as thematic dot maps and control points on cartographic maps.
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
A Clustering Algorithm for Liver Lesion Segmentation of Diffusion-Weighted MR Images
Jha, Abhinav K.; Rodríguez, Jeffrey J.; Stephen, Renu M.; Stopeck, Alison T.
2010-01-01
In diffusion-weighted magnetic resonance imaging, accurate segmentation of liver lesions in the diffusion-weighted images is required for computation of the apparent diffusion coefficient (ADC) of the lesion, the parameter that serves as an indicator of lesion response to therapy. However, the segmentation problem is challenging due to low SNR, fuzzy boundaries and speckle and motion artifacts. We propose a clustering algorithm that incorporates spatial information and a geometric constraint to solve this issue. We show that our algorithm provides improved accuracy compared to existing segmentation algorithms. PMID:21151837
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION.
Wang, Qingguo; Shang, Charles; Xu, Dong; Shang, Yi
2013-10-25
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION
WANG, QINGGUO; SHANG, CHARLES; XU, DONG
2014-01-01
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
NASA Astrophysics Data System (ADS)
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
NASA Astrophysics Data System (ADS)
Bevilacqua, A.; Campanini, R.; Lanconelli, N.
We have developed a method for the detection of clusters of microcalcifications in digital mammograms. Here, we present a genetic algorithm used to optimize the choice of the parameters in the detection scheme. The optimization has allowed the improvement of the performance, the detailed study of the influence of the various parameters on the performance and an accurate investigation of the behavior of the detection method on unknown cases. We reach a sensitivity of 96.2% with 0.7 false positive clusters per image on the Nijmegen database; we are also able to identify the most significant parameters. In addition, we have examined the feasibility of a distributed genetic algorithm implemented on a non-dedicated Cluster Of Workstations. We get very good results both in terms of quality and efficiency.
Dong, Feng; Pierpaoli, Elena; Gunn, James E.; Wechsler, Risa H.
2007-10-29
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is {approx} 85% complete and over 90% pure for clusters with masses above 1.0 x 10{sup 14}h{sup -1} M and redshifts up to z = 0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity of {Delta} = 200, we find the derived cluster richness {Lambda}{sub 200} a roughly linear indicator of its virial mass M{sub 200}, which well recovers the relation between total luminosity and cluster mass of the input simulation.
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
Yang, Jing; Chen, Limin; Zhang, Jianpei
2015-01-01
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification. PMID:26090857
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.
NASA Astrophysics Data System (ADS)
Quintanilla-Domínguez, Joel; Ojeda-Magaña, Benjamín; Marcano-Cedeño, Alexis; Cortina-Januchs, María G.; Vega-Corona, Antonio; Andina, Diego
2011-12-01
A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
NASA Astrophysics Data System (ADS)
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
A genetic algorithmic approach to antenna null-steering using a cluster computer.
NASA Astrophysics Data System (ADS)
Recine, Greg; Cui, Hong-Liang
2001-06-01
We apply a genetic algorithm (GA) to the problem of electronically steering the maximums and nulls of an antenna array to desired positions (null toward enemy listener/jammer, max toward friendly listener/transmitter). The antenna pattern itself is computed using NEC2 which is called by the main GA program. Since a GA naturally lends itself to parallelization, this simulation was applied to our new twin 64-node cluster computers (Gemini). Design issues and uses of the Gemini cluster in our group are also discussed.
K-Means Re-Clustering-Algorithmic Options with Quantifiable Performance Comparisons
Meyer, A W; Paglieroni, D; Asteneh, C
2002-12-17
This paper presents various architectural options for implementing a K-Means Re-Clustering algorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.
KD-tree based clustering algorithm for fast face recognition on large-scale data
NASA Astrophysics Data System (ADS)
Wang, Yuanyuan; Lin, Yaping; Yang, Junfeng
2015-07-01
This paper proposes an acceleration method for large-scale face recognition system. When dealing with a large-scale database, face recognition is time-consuming. In order to tackle this problem, we employ the k-means clustering algorithm to classify face data. Specifically, the data in each cluster are stored in the form of the kd-tree, and face feature matching is conducted with the kd-tree based nearest neighborhood search. Experiments on CAS-PEAL and self-collected database show the effectiveness of our proposed method.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm
Baig, Fahd; Little, Max A.
2016-01-01
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525
Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering
NASA Astrophysics Data System (ADS)
Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng
2015-10-01
A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Rondina, Gustavo G; Da Silva, Juarez L F
2013-09-23
Suggestions for improving the Basin-Hopping Monte Carlo (BHMC) algorithm for unbiased global optimization of clusters and nanoparticles are presented. The traditional basin-hopping exploration scheme with Monte Carlo sampling is improved by bringing together novel strategies and techniques employed in different global optimization methods, however, with the care of keeping the underlying algorithm of BHMC unchanged. The improvements include a total of eleven local and nonlocal trial operators tailored for clusters and nanoparticles that allow an efficient exploration of the potential energy surface, two different strategies (static and dynamic) of operator selection, and a filter operator to handle unphysical solutions. In order to assess the efficiency of our strategies, we applied our implementation to several classes of systems, including Lennard-Jones and Sutton-Chen clusters with up to 147 and 148 atoms, respectively, a set of Lennard-Jones nanoparticles with sizes ranging from 200 to 1500 atoms, binary Lennard-Jones clusters with up to 100 atoms, (AgPd)55 alloy clusters described by the Sutton-Chen potential, and aluminum clusters with up to 30 atoms described within the density functional theory framework. Using unbiased global search our implementation was able to reproduce successfully the great majority of all published results for the systems considered and in many cases with more efficiency than the standard BHMC. We were also able to locate previously unknown global minimum structures for some of the systems considered. This revised BHMC method is a valuable tool for aiding theoretical investigations leading to a better understanding of atomic structures of clusters and nanoparticles. PMID:23957311
Robustness of ‘cut and splice’ genetic algorithms in the structural optimization of atomic clusters
NASA Astrophysics Data System (ADS)
Froltsov, Vladimir A.; Reuter, Karsten
2009-05-01
We return to the geometry optimization problem of Lennard-Jones clusters to analyze the performance dependence of 'cut and splice' genetic algorithms (GAs) on the employed population size. We generally find that admixing twinning mutation moves leads to an improved robustness of the algorithm efficiency with respect to this a priori unknown technical parameter. The resulting very stable performance of the corresponding mutation + mating GA implementation over a wide range of population sizes is an important feature when addressing unknown systems with computationally involved first-principles based GA sampling.
Ishii, Satoshi; Kadota, Koji; Senoo, Keishi
2009-09-01
DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.
Zhang, Yipu; Wang, Ping
2015-01-01
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. PMID:26236718
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.
Rong, Zhou; Tianyu, Ma; Yongjie, Jin
2005-01-01
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies.
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure. PMID:25114965
Karimi, Abbas; Afsharfarnia, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R
2014-01-01
The stability of clusters is a serious issue in mobile ad hoc networks. Low stability of clusters may lead to rapid failure of clusters, high energy consumption for reclustering, and decrease in the overall network stability in mobile ad hoc network. In order to improve the stability of clusters, weight-based clustering algorithms are utilized. However, these algorithms only use limited features of the nodes. Thus, they decrease the weight accuracy in determining node's competency and lead to incorrect selection of cluster heads. A new weight-based algorithm presented in this paper not only determines node's weight using its own features, but also considers the direct effect of feature of adjacent nodes. It determines the weight of virtual links between nodes and the effect of the weights on determining node's final weight. By using this strategy, the highest weight is assigned to the best choices for being the cluster heads and the accuracy of nodes selection increases. The performance of new algorithm is analyzed by using computer simulation. The results show that produced clusters have longer lifetime and higher stability. Mathematical simulation shows that this algorithm has high availability in case of failure.
Detection and clustering of features in aerial images by neuron network-based algorithm
NASA Astrophysics Data System (ADS)
Vozenilek, Vit
2015-12-01
The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
NASA Technical Reports Server (NTRS)
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores in Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J.; Rykoff, Eli; Song, Jeeseon
2015-10-26
Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less
An improved scheduling algorithm for 3D cluster rendering with platform LSF
NASA Astrophysics Data System (ADS)
Xu, Wenli; Zhu, Yi; Zhang, Liping
2013-10-01
High-quality photorealistic rendering of 3D modeling needs powerful computing systems. On this demand highly efficient management of cluster resources develops fast to exert advantages. This paper is absorbed in the aim of how to improve the efficiency of 3D rendering tasks in cluster. It focuses research on a dynamic feedback load balance (DFLB) algorithm, the work principle of load sharing facility (LSF) and optimization of external scheduler plug-in. The algorithm can be applied into match and allocation phase of a scheduling cycle. Candidate hosts is prepared in sequence in match phase. And the scheduler makes allocation decisions for each job in allocation phase. With the dynamic mechanism, new weight is assigned to each candidate host for rearrangement. The most suitable one will be dispatched for rendering. A new plugin module of this algorithm has been designed and integrated into the internal scheduler. Simulation experiments demonstrate the ability of improved plugin module is superior to the default one for rendering tasks. It can help avoid load imbalance among servers, increase system throughput and improve system utilization.
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix
NASA Technical Reports Server (NTRS)
Rogers, James L.; Korte, John J.; Bilardo, Vincent J.
2006-01-01
Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
NASA Astrophysics Data System (ADS)
Pluchino, A.; Rapisarda, A.; Latora, V.
2008-10-01
We have recently introduced [Phys. Rev. E 75, 045102(R) (2007); AIP Conference Proceedings 965, 2007, p. 323] an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.
Chirplet Clustering Algorithm for Black Hole Coalescence Signatures in Gravitational Wave Detectors
NASA Astrophysics Data System (ADS)
Nemtzow, Zachary; Chassande-Mottin, Eric; Mohapatra, Satyanarayan R. P.; Cadonati, Laura
2012-03-01
Within this decade, gravitational waves will become new astrophysical messengers with which we can learn about our universe. Gravitational wave emission from the coalescence of massive bodies is projected to be a promising source for the next generation of gravitational wave detectors: advanced LIGO and advanced Virgo. We describe a method for the detection of binary black hole coalescences using a chirplet template bank, Chirplet Omega. By appropriately clustering the linearly variant frequency sin-Gaussian pixels the algorithm uses to decompose the data, the signal to noise ratio SNR of events extended in time can be significantly increased. We present such a clustering method and discuss its impacts on performance and detectability of binary black hole coalescences in ground based gravitational wave interferometers.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
NASA Astrophysics Data System (ADS)
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei
2011-04-01
An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.
Multispectral image classification of MRI data using an empirically-derived clustering algorithm
Horn, K.M.; Osbourn, G.C.; Bouchard, A.M.; Sanders, J.A. |
1998-08-01
Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.
Meanie3D - a mean-shift based, multivariate, multi-scale clustering and tracking algorithm
NASA Astrophysics Data System (ADS)
Simon, Jürgen-Lorenz; Malte, Diederich; Silke, Troemel
2014-05-01
Project OASE is the one of 5 work groups at the HErZ (Hans Ertel Centre for Weather Research), an ongoing effort by the German weather service (DWD) to further research at Universities concerning weather prediction. The goal of project OASE is to gain an object-based perspective on convective events by identifying them early in the onset of convective initiation and follow then through the entire lifecycle. The ability to follow objects in this fashion requires new ways of object definition and tracking, which incorporate all the available data sets of interest, such as Satellite imagery, weather Radar or lightning counts. The Meanie3D algorithm provides the necessary tool for this purpose. Core features of this new approach to clustering (object identification) and tracking are the ability to identify objects using the mean-shift algorithm applied to a multitude of variables (multivariate), as well as the ability to detect objects on various scales (multi-scale) using elements of Scale-Space theory. The algorithm works in 2D as well as 3D without modifications. It is an extension of a method well known from the field of computer vision and image processing, which has been tailored to serve the needs of the meteorological community. In spite of the special application to be demonstrated here (like convective initiation), the algorithm is easily tailored to provide clustering and tracking for a wide class of data sets and problems. In this talk, the demonstration is carried out on two of the OASE group's own composite sets. One is a 2D nationwide composite of Germany including C-Band Radar (2D) and Satellite information, the other a 3D local composite of the Bonn/Jülich area containing a high-resolution 3D X-Band Radar composite.
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Fong, Simon
2012-01-01
Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers' gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box) have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm. PMID:22619492
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-12-10
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime.
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Farah, Ihsen; Nguyen, Thi Nguyet Que; Groh, Audrey; Guenot, Dominique; Jeannesson, Pierre; Gobinet, Cyril
2016-05-23
The coupling between Fourier-transform infrared (FTIR) imaging and unsupervised classification is effective in revealing the different structures of human tissues based on their specific biomolecular IR signatures; thus the spectral histology of the studied samples is achieved. However, the most widely applied clustering methods in spectral histology are local search algorithms, which converge to a local optimum, depending on initialization. Multiple runs of the techniques estimate multiple different solutions. Here, we propose a memetic algorithm, based on a genetic algorithm and a k-means clustering refinement, to perform optimal clustering. In addition, this approach was applied to the acquired FTIR images of normal human colon tissues originating from five patients. The results show the efficiency of the proposed memetic algorithm to achieve the optimal spectral histology of these samples, contrary to k-means. PMID:27110605
Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.
Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709
Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.
Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.
Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C
2014-06-01
KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition.
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation
Theiler, J.; Gisler, G.
1997-07-01
The recent and continuing construction of multi and hyper spectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earth bound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach; a variant of the standard k-means algorithm which uses both spatial and spectral properties of the image. The segmented image has the property that pixels which are spatially contiguous are more likely to be in the same class than are random pairs of pixels. This property naturally comes at some cost in terms of the compactness of the clusters in the spectral domain, but we have found that the spatial contiguity and spectral compactness properties are nearly orthogonal, which means that we can make considerable improvements in the one with minimal loss in the other.
Analysis Clustering of Electricity Usage Profile Using K-Means Algorithm
NASA Astrophysics Data System (ADS)
Amri, Yasirli; Lailatul Fadhilah, Amanda; Fatmawati; Setiani, Novi; Rani, Septia
2016-01-01
Electricity is one of the most important needs for human life in many sectors. Demand for electricity will increase in line with population and economic growth. Adjustment of the amount of electricity production in specified time is important because the cost of storing electricity is expensive. For handling this problem, we need knowledge about the electricity usage pattern of clients. This pattern can be obtained by using clustering techniques. In this paper, clustering is used to obtain the similarity of electricity usage patterns in a specified time. We use K-Means algorithm to employ clustering on the dataset of electricity consumption from 370 clients that collected in a year. Result of this study, we obtained an interesting pattern that there is a big group of clients consume the lowest electric load in spring season, but in another group, the lowest electricity consumption occurred in winter season. From this result, electricity provider can make production planning in specified season based on pattern of electricity usage profile.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-14
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Blood vessel extraction and optic disc removal using curvelet transform and kernel fuzzy c-means.
Kar, Sudeshna Sil; Maity, Santi P
2016-03-01
This paper proposes an automatic blood vessel extraction method on retinal images using matched filtering in an integrated system design platform that involves curvelet transform and kernel based fuzzy c-means. Since curvelet transform represents the lines, the edges and the curvatures very well and in compact form (by less number of coefficients) compared to other multi-resolution techniques, this paper uses curvelet transform for enhancement of the retinal vasculature. Matched filtering is then used to intensify the blood vessels' response which is further employed by kernel based fuzzy c-means algorithm that extracts the vessel silhouette from the background through non-linear mapping. For pathological images, in addition to matched filtering, Laplacian of Gaussian filter is also employed to distinguish the step and the ramp like signal from that of vessel structure. To test the efficacy of the proposed method, the algorithm has also been applied to images in presence of additive white Gaussian noise where the curvelet transform has been used for image denoising. Performance is evaluated on publicly available DRIVE, STARE and DIARETDB1 databases and is compared with the large number of existing blood vessel extraction methodologies. Simulation results demonstrate that the proposed method is very much efficient in detecting the long and the thick as well as the short and the thin vessels with an average accuracy of 96.16% for the DRIVE and 97.35% for the STARE database wherein the existing methods fail to extract the tiny and the thin vessels. PMID:26848729
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Adham, Manal T; Bentley, Peter J
2016-08-01
This paper proposes and evaluates a solution to the truck redistribution problem prominent in London's Santander Cycle scheme. Due to the complexity of this NP-hard combinatorial optimisation problem, no efficient optimisation techniques are known to solve the problem exactly. This motivates our use of the heuristic Artificial Ecosystem Algorithm (AEA) to find good solutions in a reasonable amount of time. The AEA is designed to take advantage of highly distributed computer architectures and adapt to changing problems. In the AEA a problem is first decomposed into its relative sub-components; they then evolve solution building blocks that fit together to form a single optimal solution. Three variants of the AEA centred on evaluating clustering methods are presented: the baseline AEA, the community-based AEA which groups stations according to journey flows, and the Adaptive AEA which actively modifies clusters to cater for changes in demand. We applied these AEA variants to the redistribution problem prominent in bike share schemes (BSS). The AEA variants are empirically evaluated using historical data from Santander Cycles to validate the proposed approach and prove its potential effectiveness.
Silva, Mateus X; Galvão, Breno R L; Belchior, Jadson C
2014-05-21
Genetic algorithm is employed to survey an empirical potential energy surface for small Na(x)K(y) clusters with x + y ≤ 15, providing initial conditions for electronic structure methods. The minima of such empirical potential are assessed and corrected using high level ab initio methods such as CCSD(T), CR-CCSD(T)-L and MP2, and benchmark results are obtained for specific cases. The results are the first calculations for such small alloy clusters and may serve as a reference for further studies. The validity and choice of a proper functional and basis set for DFT calculations are then explored using the benchmark data, where it was found that the usual DFT approach may fail to provide the correct qualitative result for specific systems. The best general agreement to the benchmark calculations is achieved with def2-TZVPP basis set with SVWN5 functional, although the LANL2DZ basis set (with effective core potential) and SVWN5 functional provided the most cost-effective results. PMID:24691391
NASA Astrophysics Data System (ADS)
Bagheripour, Parisa; Asoodeh, Mojtaba
2013-12-01
Porosity, the void portion of reservoir rocks, determines the volume of hydrocarbon accumulation and has a great control on assessment and development of hydrocarbon reservoirs. Accurate determination of porosity from core analysis is highly cost, time, and labor intensive. Therefore, the mission of finding an accurate, fast and cheap way of determining porosity is unavoidable. On the other hand, conventional well log data, available in almost all wells contain invaluable implicit information about the porosity. Therefore, an intelligent system can explicate this information. Fuzzy logic is a powerful tool for handling geosciences problem which is associated with uncertainty. However, determination of the best fuzzy formulation is still an issue. This study purposes an improved strategy, called hybrid genetic algorithm-pattern search (GA-PS) technique, against the widely held subtractive clustering (SC) method for setting up fuzzy rules between core porosity and petrophysical logs. Hybrid GA-PS technique is capable of extracting optimal parameters for fuzzy clusters (membership functions) which consequently results in the best fuzzy formulation. Results indicate that GA-PS technique manipulates both mean and variance of Gaussian membership functions contrary to SC that only has a control on mean of Gaussian membership functions. A comparison between hybrid GA-PS technique and SC method confirmed the superiority of GA-PS technique in setting up fuzzy rules. The proposed strategy was successfully applied to one of the Iranian carbonate reservoir rocks.
A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization.
He, Sheng; Samara, Petros; Burgers, Jan; Schomaker, Lambert
2016-11-01
It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results. PMID:27576248
Anticipation versus adaptation in Evolutionary Algorithms: The case of Non-Stationary Clustering
NASA Astrophysics Data System (ADS)
González, A. I.; Graña, M.; D'Anjou, A.; Torrealdea, F. J.
1998-07-01
From the technological point of view is usually more important to ensure the ability to react promptly to changing environmental conditions than to try to forecast them. Evolution Algorithms were proposed initially to drive the adaptation of complex systems to varying or uncertain environments. In the general setting, the adaptive-anticipatory dilemma reduces itself to the placement of the interaction with the environment in the computational schema. Adaptation consists of the estimation of the proper parameters from present data in order to react to a present environment situation. Anticipation consists of the estimation from present data in order to react to a future environment situation. This duality is expressed in the Evolutionary Computation paradigm by the precise location of the consideration of present data in the computation of the individuals fitness function. In this paper we consider several instances of Evolutionary Algorithms applied to precise problem and perform an experiment that test their response as anticipative and adaptive mechanisms. The non stationary problem considered is that of Non Stationary Clustering, more precisely the adaptive Color Quantization of image sequences. The experiment illustrates our ideas and gives some quantitative results that may support the proposition of the Evolutionary Computation paradigm for other tasks that require the interaction with a Non-Stationary environment.
Ashton, Douglas J; Liu, Jiwen; Luijten, Erik; Wilding, Nigel B
2010-11-21
Highly size-asymmetrical fluid mixtures arise in a variety of physical contexts, notably in suspensions of colloidal particles to which much smaller particles have been added in the form of polymers or nanoparticles. Conventional schemes for simulating models of such systems are hamstrung by the difficulty of relaxing the large species in the presence of the small one. Here we describe how the rejection-free geometrical cluster algorithm of Liu and Luijten [J. Liu and E. Luijten, Phys. Rev. Lett. 92, 035504 (2004)] can be embedded within a restricted Gibbs ensemble to facilitate efficient and accurate studies of fluid phase behavior of highly size-asymmetrical mixtures. After providing a detailed description of the algorithm, we summarize the bespoke analysis techniques of [Ashton et al., J. Chem. Phys. 132, 074111 (2010)] that permit accurate estimates of coexisting densities and critical-point parameters. We apply our methods to study the liquid-vapor phase diagram of a particular mixture of Lennard-Jones particles having a 10:1 size ratio. As the reservoir volume fraction of small particles is increased in the range of 0%-5%, the critical temperature decreases by approximately 50%, while the critical density drops by some 30%. These trends imply that in our system, adding small particles decreases the net attraction between large particles, a situation that contrasts with hard-sphere mixtures where an attractive depletion force occurs.
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
NASA Astrophysics Data System (ADS)
Guo, Yue; Wang, Liang-Liang; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Qun, Ou-Yang; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao
2016-01-01
Considering the effects of aging on the existing Inner Drift Chamber (IDC) of BESIII, a GEM-based inner tracker, the Cylindrical-GEM Inner Tracker (CGEM-IT), is proposed to be designed and constructed as an upgrade candidate for the IDC. This paper introduces a full simulation package for the CGEM-IT with a simplified digitization model, and describes the development of software for cluster reconstruction and track fitting, using a track fitting algorithm based on the Kalman filter method. Preliminary results for the reconstruction algorithms which are obtained using a Monte Carlo sample of single muon events in the CGEM-IT, show that the CGEM-IT has comparable momentum resolution and transverse vertex resolution to the IDC, and a better z-direction resolution than the IDC. Supported by National Key Basic Research Program of China (2015CB856700), National Natural Science Foundation of China (11205184, 11205182) and Joint Funds of National Natural Science Foundation of China (U1232201)
Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude
2015-01-22
The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.
A heuristic method for finding the optimal number of clusters with application in medical data.
Bayati, Hamidreza; Davoudi, Heydar; Fatemizadeh, Emad
2008-01-01
In this paper, a heuristic method for determining the optimal number of clusters is proposed. Four clustering algorithms, namely K-means, Growing Neural Gas, Simulated Annealing based technique, and Fuzzy C-means in conjunction with three well known cluster validity indices, namely Davies-Bouldin index, Calinski-Harabasz index, Maulik-Bandyopadhyay index, in addition to the proposed index are used. Our simulations evaluate capability of mentioned indices in some artificial and medical datasets. PMID:19163761
NASA Astrophysics Data System (ADS)
Huang, Zhipeng; Gao, Lihong; Wang, Yangwei; Wang, Fuchi
2016-06-01
The Johnson-Cook (J-C) constitutive model is widely used in the finite element simulation, as this model shows the relationship between stress and strain in a simple way. In this paper, a cluster global optimization algorithm is proposed to determine the J-C constitutive model parameters of materials. A set of assumed parameters is used for the accuracy verification of the procedure. The parameters of two materials (401 steel and 823 steel) are determined. Results show that the procedure is reliable and effective. The relative error between the optimized and assumed parameters is no more than 4.02%, and the relative error between the optimized and assumed stress is 0.2% × 10-5. The J-C constitutive parameters can be determined more precisely and quickly than the traditional manual procedure. Furthermore, all the parameters can be simultaneously determined using several curves under different experimental conditions. A strategy is also proposed to accurately determine the constitutive parameters.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules. PMID:27258041
NASA Astrophysics Data System (ADS)
Huang, Zhipeng; Gao, Lihong; Wang, Yangwei; Wang, Fuchi
2016-09-01
The Johnson-Cook (J-C) constitutive model is widely used in the finite element simulation, as this model shows the relationship between stress and strain in a simple way. In this paper, a cluster global optimization algorithm is proposed to determine the J-C constitutive model parameters of materials. A set of assumed parameters is used for the accuracy verification of the procedure. The parameters of two materials (401 steel and 823 steel) are determined. Results show that the procedure is reliable and effective. The relative error between the optimized and assumed parameters is no more than 4.02%, and the relative error between the optimized and assumed stress is 0.2% × 10-5. The J-C constitutive parameters can be determined more precisely and quickly than the traditional manual procedure. Furthermore, all the parameters can be simultaneously determined using several curves under different experimental conditions. A strategy is also proposed to accurately determine the constitutive parameters.
Fleisch, Markus C.; Maxell, Christopher A.; Kuper, Claudia K.; Brown, Erika T.; Parvin, Bahram; Barcellos-Hoff, Mary-Helen; Costes,Sylvain V.
2006-03-08
Centrosomes are small organelles that organize the mitoticspindle during cell division and are also involved in cell shape andpolarity. Within epithelial tumors, such as breast cancer, and somehematological tumors, centrosome abnormalities (CA) are common, occurearly in disease etiology, and correlate with chromosomal instability anddisease stage. In situ quantification of CA by optical microscopy ishampered by overlap and clustering of these organelles, which appear asfocal structures. CA has been frequently associated with Tp53 status inpremalignant lesions and tumors. Here we describe an approach toaccurately quantify centrosomes in tissue sections and tumors.Considering proliferation and baseline amplification rate the resultingpopulation based ratio of centrosomes per nucleus allow the approximationof the proportion of cells with CA. Using this technique we show that20-30 percent of cells have amplified centrosomes in Tp53 null mammarytumors. Combining fluorescence detection, deconvolution microscopy and amathematical algorithm applied to a maximum intensity projection we showthat this approach is superior to traditional investigator based visualanalysis or threshold-based techniques.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection.
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Cloud classification from satellite data using a fuzzy sets algorithm - A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1989-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X.
2015-01-01
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols. PMID:26712764
Yang, Liu; Lu, Yinzhi; Zhong, Yuanchang; Wu, Xuegang; Yang, Simon X
2015-12-26
Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols.
NASA Astrophysics Data System (ADS)
Valaparla, Sunil K.; Peng, Qi; Gao, Feng; Clarke, Geoffrey D.
2014-03-01
Accurate measurements of human body fat distribution are desirable because excessive body fat is associated with impaired insulin sensitivity, type 2 diabetes mellitus (T2DM) and cardiovascular disease. In this study, we hypothesized that the performance of water suppressed (WS) MRI is superior to non-water suppressed (NWS) MRI for volumetric assessment of abdominal subcutaneous (SAT), intramuscular (IMAT), visceral (VAT), and total (TAT) adipose tissues. We acquired T1-weighted images on a 3T MRI system (TIM Trio, Siemens), which was analyzed using semi-automated segmentation software that employs a fuzzy c-means (FCM) clustering algorithm. Sixteen contiguous axial slices, centered at the L4-L5 level of the abdomen, were acquired in eight T2DM subjects with water suppression (WS) and without (NWS). Histograms from WS images show improved separation of non-fatty tissue pixels from fatty tissue pixels, compared to NWS images. Paired t-tests of WS versus NWS showed a statistically significant lower volume of lipid in the WS images for VAT (145.3 cc less, p=0.006) and IMAT (305 cc less, p<0.001), but not SAT (14.1 cc more, NS). WS measurements of TAT also resulted in lower fat volumes (436.1 cc less, p=0.002). There is strong correlation between WS and NWS quantification methods for SAT measurements (r=0.999), but poorer correlation for VAT studies (r=0.845). These results suggest that NWS pulse sequences may overestimate adipose tissue volumes and that WS pulse sequences are more desirable due to the higher contrast generated between fatty and non-fatty tissues.
Sai, Linwei; Zhao, Jijun; Huang, Xiaoming; Wang, Jun
2012-01-01
Using genetic algorithm incorporated with density functional theory, we have explored the size evolution of structural and electronic properties of neutral gallium clusters of 20-40 atoms in terms of their ground state structures, binding energies, second differences of energy, HOMO-LUMO gaps, distributions of bond length and bond angle, and electron density of states. In the size range studied, the Ga(n) clusters exhibit several growth patterns, and the core-shell structures become dominant from Ga31. With high point group symmetries, Ga23 and Ga36 show particularly high stability and Ga36 owns a large HOMO-LUMO gap. The atomic structures and electronic states of Ga(n) clusters significantly differ from the a solid but resemble beta solid and liquid to certain extent.
Sumithra, Subramaniam; Victoire, T Aruldoss Albert
2015-01-01
Due to large dimension of clusters and increasing size of sensor nodes, finding the optimal route and cluster for large wireless sensor networks (WSN) seems to be highly complex and cumbersome. This paper proposes a new method to determine a reasonably better solution of the clustering and routing problem with the highest concern of efficient energy consumption of the sensor nodes for extending network life time. The proposed method is based on the Differential Evolution (DE) algorithm with an improvised search operator called Diversified Vicinity Procedure (DVP), which models a trade-off between energy consumption of the cluster heads and delay in forwarding the data packets. The obtained route using the proposed method from all the gateways to the base station is comparatively lesser in overall distance with less number of data forwards. Extensive numerical experiments demonstrate the superiority of the proposed method in managing energy consumption of the WSN and the results are compared with the other algorithms reported in the literature. PMID:26516635
Li, Weizhong [San Diego Supercomputer Center
2016-07-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Li, Weizhong
2011-10-12
San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithm (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.
Tsai, Ming-Hui; Huang, Yueh-Min
2014-01-01
Wireless sensor networks (WSNs) have emerged as a promising solution for various applications due to their low cost and easy deployment. Typically, their limited power capability, i.e., battery powered, make WSNs encounter the challenge of extension of network lifetime. Many hierarchical protocols show better ability of energy efficiency in the literature. Besides, data reduction based on the correlation of sensed readings can efficiently reduce the amount of required transmissions. Therefore, we use a sub-clustering procedure based on spatial data correlation to further separate the hierarchical (clustered) architecture of a WSN. The proposed algorithm (2TC-cor) is composed of two procedures: the prediction model construction procedure and the sub-clustering procedure. The energy conservation benefits by the reduced transmissions, which are dependent on the prediction model. Also, the energy can be further conserved because of the representative mechanism of sub-clustering. As presented by simulation results, it shows that 2TC-cor can effectively conserve energy and monitor accurately the environment within an acceptable level. PMID:25412220
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Abedini, Mohammad; Moradi, Mohammad H; Hosseinian, S M
2016-03-01
This paper proposes a novel method to address reliability and technical problems of microgrids (MGs) based on designing a number of self-adequate autonomous sub-MGs via adopting MGs clustering thinking. In doing so, a multi-objective optimization problem is developed where power losses reduction, voltage profile improvement and reliability enhancement are considered as the objective functions. To solve the optimization problem a hybrid algorithm, named HS-GA, is provided, based on genetic and harmony search algorithms, and a load flow method is given to model different types of DGs as droop controller. The performance of the proposed method is evaluated in two case studies. The results provide support for the performance of the proposed method. PMID:26767800
Reinke, R.E.
1991-01-01
Clustering is the problem of finding a good organization for data. Because there are many kinds of clustering problems, and because there are many possible clusterings for any data set, clustering programs use knowledge and assumptions about individual problems to make clustering tractable. Cluster-analysis techniques allow knowledge to be expressed in the choice of a pairwise distance measure and in the choice of clustering algorithm. Conceptual clustering adds knowledge and preferences about cluster descriptions. In this study the author describes symbolic clustering, which adds representation choice to the set of ways a data analyst can use problem-specific knowledge. He develops an informal model for symbolic clustering, and uses it to suggest where and how knowledge can be expressed in clustering. A language for creating symbolic clusters, based on the model, was developed and tested on three real clustering problems. The study concludes with a discussion of the implications of the model and the results for clustering in general.
Solution of facility location problem in Turkey by using fuzzy C-means method
NASA Astrophysics Data System (ADS)
Kocakaya, Mustafa Nabi; Türkakın, Osman Hürol
2013-10-01
Facility location problem is one of most frequent problems, which is encountered while deciding facility places such as factories, warehouses. There are various techniques developed to solve facility location problems. Fuzzy c-means method is one of the most usable techniques between them. In this study, optimum warehouse location for natural stone mines is found by using fuzzy c-means method.
NASA Astrophysics Data System (ADS)
Cazade, Pierre-André; Zheng, Wenwei; Prada-Gracia, Diego; Berezovska, Ganna; Rao, Francesco; Clementi, Cecilia; Meuwly, Markus
2015-01-01
The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
NASA Astrophysics Data System (ADS)
Best, Andrew; Kapalo, Katelynn A.; Warta, Samantha F.; Fiore, Stephen M.
2016-05-01
Human-robot teaming largely relies on the ability of machines to respond and relate to human social signals. Prior work in Social Signal Processing has drawn a distinction between social cues (discrete, observable features) and social signals (underlying meaning). For machines to attribute meaning to behavior, they must first understand some probabilistic relationship between the cues presented and the signal conveyed. Using data derived from a study in which participants identified a set of salient social signals in a simulated scenario and indicated the cues related to the perceived signals, we detail a learning algorithm, which clusters social cue observations and defines an "N-Most Likely States" set for each cluster. Since multiple signals may be co-present in a given simulation and a set of social cues often maps to multiple social signals, the "N-Most Likely States" approach provides a dramatic improvement over typical linear classifiers. We find that the target social signal appears in a "3 most-likely signals" set with up to 85% probability. This results in increased speed and accuracy on large amounts of data, which is critical for modeling social cognition mechanisms in robots to facilitate more natural human-robot interaction. These results also demonstrate the utility of such an approach in deployed scenarios where robots need to communicate with human teammates quickly and efficiently. In this paper, we detail our algorithm, comparative results, and offer potential applications for robot social signal detection and machine-aided human social signal detection.
Kandalla, Krishna; Subramoni, Hari; Vishnu, Abhinav; Panda, Dhabaleswar K.
2010-04-01
Modern high performance computing systems are being increasingly deployed in a hierarchical fashion with multi-core computing platforms forming the base of the hierarchy. These systems are usually comprised of multiple racks, with each rack consisting of a finite number of chassis, with each chassis having multiple compute nodes or blades, based on multi-core architectures. The networks are also hierarchical with multiple levels of switches. Message exchange operations between processes that belong to different racks involve multiple hops across different switches and this directly affects the performance of collective operations. In this paper, we take on the challenges involved in detecting the topology of large scale InfiniBand clusters and leveraging this knowledge to design efficient topology-aware algorithms for collective operations. We also propose a communication model to analyze the communication costs involved in collective operations on large scale supercomputing systems. We have analyzed the performance characteristics of two collectives, MPI_Gather and MPI_Scatter on such systems and we have proposed topology-aware algorithms for these operations. Our experimental results have shown that the proposed algorithms can improve the performance of these collective operations by almost 54% at the micro-benchmark level.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2016-06-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
CNN universal machine as classificaton platform: an art-like clustering algorithm.
Bálya, David
2003-12-01
Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Classification of excessive domestic water consumption using Fuzzy Clustering Method
NASA Astrophysics Data System (ADS)
Zairi Zaidi, A.; Rasmani, Khairul A.
2016-08-01
Demand for clean and treated water is increasing all over the world. Therefore it is crucial to conserve water for better use and to avoid unnecessary, excessive consumption or wastage of this natural resource. Classification of excessive domestic water consumption is a difficult task due to the complexity in determining the amount of water usage per activity, especially as the data is known to vary between individuals. In this study, classification of excessive domestic water consumption is carried out using a well-known Fuzzy C-Means (FCM) clustering algorithm. Consumer data containing information on daily, weekly and monthly domestic water usage was employed for the purpose of classification. Using the same dataset, the result produced by the FCM clustering algorithm is compared with the result obtained from a statistical control chart. The finding of this study demonstrates the potential use of the FCM clustering algorithm for the classification of domestic consumer water consumption data.
Cickovski, Trevor; Flor, Tiffany; Irving-Sachs, Galen; Novikov, Philip; Parda, James; Narasimhan, Giri
2015-01-01
In order to make multiple copies of a target sequence in the laboratory, the technique of Polymerase Chain Reaction (PCR) requires the design of "primers", which are short fragments of nucleotides complementary to the flanking regions of the target sequence. If the same primer is to amplify multiple closely related target sequences, then it is necessary to make the primers "degenerate", which would allow it to hybridize to target sequences with a limited amount of variability that may have been caused by mutations. However, the PCR technique can only allow a limited amount of degeneracy, and therefore the design of degenerate primers requires the identification of reasonably well-conserved regions in the input sequences. We take an existing algorithm for designing degenerate primers that is based on clustering and parallelize it in a web-accessible software package GPUDePiCt, using a shared memory model and the computing power of Graphics Processing Units (GPUs). We test our implementation on large sets of aligned sequences from the human genome and show a multi-fold speedup for clustering using our hybrid GPU/CPU implementation over a pure CPU approach for these sequences, which consist of more than 7,500 nucleotides. We also demonstrate that this speedup is consistent over larger numbers and longer lengths of aligned sequences.
Cickovski, Trevor; Flor, Tiffany; Irving-Sachs, Galen; Novikov, Philip; Parda, James; Narasimhan, Giri
2015-01-01
In order to make multiple copies of a target sequence in the laboratory, the technique of Polymerase Chain Reaction (PCR) requires the design of "primers", which are short fragments of nucleotides complementary to the flanking regions of the target sequence. If the same primer is to amplify multiple closely related target sequences, then it is necessary to make the primers "degenerate", which would allow it to hybridize to target sequences with a limited amount of variability that may have been caused by mutations. However, the PCR technique can only allow a limited amount of degeneracy, and therefore the design of degenerate primers requires the identification of reasonably well-conserved regions in the input sequences. We take an existing algorithm for designing degenerate primers that is based on clustering and parallelize it in a web-accessible software package GPUDePiCt, using a shared memory model and the computing power of Graphics Processing Units (GPUs). We test our implementation on large sets of aligned sequences from the human genome and show a multi-fold speedup for clustering using our hybrid GPU/CPU implementation over a pure CPU approach for these sequences, which consist of more than 7,500 nucleotides. We also demonstrate that this speedup is consistent over larger numbers and longer lengths of aligned sequences. PMID:26357230
A new time dependent density functional algorithm for large systems and plasmons in metal clusters
Baseggio, Oscar; Fronzoni, Giovanna; Stener, Mauro
2015-07-14
A new algorithm to solve the Time Dependent Density Functional Theory (TDDFT) equations in the space of the density fitting auxiliary basis set has been developed and implemented. The method extracts the spectrum from the imaginary part of the polarizability at any given photon energy, avoiding the bottleneck of Davidson diagonalization. The original idea which made the present scheme very efficient consists in the simplification of the double sum over occupied-virtual pairs in the definition of the dielectric susceptibility, allowing an easy calculation of such matrix as a linear combination of constant matrices with photon energy dependent coefficients. The method has been applied to very different systems in nature and size (from H{sub 2} to [Au{sub 147}]{sup −}). In all cases, the maximum deviations found for the excitation energies with respect to the Amsterdam density functional code are below 0.2 eV. The new algorithm has the merit not only to calculate the spectrum at whichever photon energy but also to allow a deep analysis of the results, in terms of transition contribution maps, Jacob plasmon scaling factor, and induced density analysis, which have been all implemented.
Muhammad, Durreshahwar; Foret, Jessica; Brady, Siobhan M.; Ducoste, Joel J.; Tuck, James; Long, Terri A.; Williams, Cranos
2015-01-01
Time course transcriptome datasets are commonly used to predict key gene regulators associated with stress responses and to explore gene functionality. Techniques developed to extract causal relationships between genes from high throughput time course expression data are limited by low signal levels coupled with noise and sparseness in time points. We deal with these limitations by proposing the Cluster and Differential Alignment Algorithm (CDAA). This algorithm was designed to process transcriptome data by first grouping genes based on stages of activity and then using similarities in gene expression to predict influential connections between individual genes. Regulatory relationships are assigned based on pairwise alignment scores generated using the expression patterns of two genes and some inferred delay between the regulator and the observed activity of the target. We applied the CDAA to an iron deficiency time course microarray dataset to identify regulators that influence 7 target transcription factors known to participate in the Arabidopsis thaliana iron deficiency response. The algorithm predicted that 7 regulators previously unlinked to iron homeostasis influence the expression of these known transcription factors. We validated over half of predicted influential relationships using qRT-PCR expression analysis in mutant backgrounds. One predicted regulator-target relationship was shown to be a direct binding interaction according to yeast one-hybrid (Y1H) analysis. These results serve as a proof of concept emphasizing the utility of the CDAA for identifying unknown or missing nodes in regulatory cascades, providing the fundamental knowledge needed for constructing predictive gene regulatory networks. We propose that this tool can be used successfully for similar time course datasets to extract additional information and infer reliable regulatory connections for individual genes. PMID:26317202
A new method based on Dempster-Shafer theory and fuzzy c-means for brain MRI segmentation
NASA Astrophysics Data System (ADS)
Liu, Jie; Lu, Xi; Li, Yunpeng; Chen, Xiaowu; Deng, Yong
2015-10-01
In this paper, a new method is proposed to decrease sensitiveness to motion noise and uncertainty in magnetic resonance imaging (MRI) segmentation especially when only one brain image is available. The method is approached with considering spatial neighborhood information by fusing the information of pixels with their neighbors with Dempster-Shafer (DS) theory. The basic probability assignment (BPA) of each single hypothesis is obtained from the membership function of applying fuzzy c-means (FCM) clustering to the gray levels of the MRI. Then multiple hypotheses are generated according to the single hypothesis. Then we update the objective pixel’s BPA by fusing the BPA of the objective pixel and those of its neighbors to get the final result. Some examples in MRI segmentation are demonstrated at the end of the paper, in which our method is compared with some previous methods. The results show that the proposed method is more effective than other methods in motion-blurred MRI segmentation.
Risk Mapping of Cutaneous Leishmaniasis via a Fuzzy C Means-based Neuro-Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Akhavan, P.; Karimi, M.; Pahlavani, P.
2014-10-01
Finding pathogenic factors and how they are spread in the environment has become a global demand, recently. Cutaneous Leishmaniasis (CL) created by Leishmania is a special parasitic disease which can be passed on to human through phlebotomus of vector-born. Studies show that economic situation, cultural issues, as well as environmental and ecological conditions can affect the prevalence of this disease. In this study, Data Mining is utilized in order to predict CL prevalence rate and obtain a risk map. This case is based on effective environmental parameters on CL and a Neuro-Fuzzy system was also used. Learning capacity of Neuro-Fuzzy systems in neural network on one hand and reasoning power of fuzzy systems on the other, make it very efficient to use. In this research, in order to predict CL prevalence rate, an adaptive Neuro-fuzzy inference system with fuzzy inference structure of fuzzy C Means clustering was applied to determine the initial membership functions. Regarding to high incidence of CL in Ilam province, counties of Ilam, Mehran, and Dehloran have been examined and evaluated. The CL prevalence rate was predicted in 2012 by providing effective environmental map and topography properties including temperature, moisture, annual, rainfall, vegetation and elevation. Results indicate that the model precision with fuzzy C Means clustering structure rises acceptable RMSE values of both training and checking data and support our analyses. Using the proposed data mining technology, the pattern of disease spatial distribution and vulnerable areas become identifiable and the map can be used by experts and decision makers of public health as a useful tool in management and optimal decision-making.
Some new indexes of cluster validity.
Bezdek, J C; Pal, N R
1998-01-01
We review two clustering algorithms (hard c-means and single linkage) and three indexes of crisp cluster validity (Hubert's statistics, the Davies-Bouldin index, and Dunn's index). We illustrate two deficiencies of Dunn's index which make it overly sensitive to noisy clusters and propose several generalizations of it that are not as brittle to outliers in the clusters. Our numerical examples show that the standard measure of interset distance (the minimum distance between points in a pair of sets) is the worst (least reliable) measure upon which to base cluster validation indexes when the clusters are expected to form volumetric clouds. Experimental results also suggest that intercluster separation plays a more important role in cluster validation than cluster diameter. Our simulations show that while Dunn's original index has operational flaws, the concept it embodies provides a rich paradigm for validation of partitions that have cloud-like clusters. Five of our generalized Dunn's indexes provide the best validation results for the simulations presented.
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining
NASA Astrophysics Data System (ADS)
Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.
The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
A novel algorithm for detecting multiple covariance and clustering of biological sequences
Shen, Wei; Li, Yan
2016-01-01
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability. Although many methods are available to detect single mutations or covariant pairs, detecting non-synchronous multiple changes at different sites in sequences remains challenging. Here, we develop a novel algorithm, named Fastcov, to identify multiple correlated changes in biological sequences using an independent pair model followed by a tandem model of site-residue elements based on inter-restriction thinking. Fastcov performed exceptionally well at harvesting co-pairs and detecting multiple covariant patterns. By 10-fold cross-validation using datasets of different scales, the characteristic patterns successfully classified the sequences into target groups with an accuracy of greater than 98%. Moreover, we demonstrated that the multiple covariant patterns represent co-evolutionary modes corresponding to the phylogenetic tree, and provide a new understanding of protein structural stability. In contrast to other methods, Fastcov provides not only a reliable and effective approach to identify covariant pairs but also more powerful functions, including multiple covariance detection and sequence classification, that are most useful for studying the point and compensatory mutations caused by natural selection, drug induction, environmental pressure, etc. PMID:27451921
Thimmaiah, Tim; Voje, William E; Carothers, James M
2015-01-01
With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.
Polat, Kemal
2012-08-01
In this paper, attribute weighting method based on the cluster centers with aim of increasing the discrimination between classes has been proposed and applied to nonlinear separable datasets including two medical datasets (mammographic mass dataset and bupa liver disorders dataset) and 2-D spiral dataset. The goals of this method are to gather the data points near to cluster center all together to transform from nonlinear separable datasets to linear separable dataset. As clustering algorithm, k-means clustering, fuzzy c-means clustering, and subtractive clustering have been used. The proposed attribute weighting methods are k-means clustering based attribute weighting (KMCBAW), fuzzy c-means clustering based attribute weighting (FCMCBAW), and subtractive clustering based attribute weighting (SCBAW) and used prior to classifier algorithms including C4.5 decision tree and adaptive neuro-fuzzy inference system (ANFIS). To evaluate the proposed method, the recall, precision value, true negative rate (TNR), G-mean1, G-mean2, f-measure, and classification accuracy have been used. The results have shown that the best attribute weighting method was the subtractive clustering based attribute weighting with respect to classification performance in the classification of three used datasets. PMID:21611787
Jiang, Joe-Air; Chen, Chia-Pang; Chuang, Cheng-Long; Lin, Tzu-Shiang; Tseng, Chwan-Lu; Yang, En-Cheng; Wang, Yung-Chung
2009-01-01
Deployment of wireless sensor networks (WSNs) has drawn much attention in recent years. Given the limited energy for sensor nodes, it is critical to implement WSNs with energy efficiency designs. Sensing coverage in networks, on the other hand, may degrade gradually over time after WSNs are activated. For mission-critical applications, therefore, energy-efficient coverage control should be taken into consideration to support the quality of service (QoS) of WSNs. Usually, coverage-controlling strategies present some challenging problems: (1) resolving the conflicts while determining which nodes should be turned off to conserve energy; (2) designing an optimal wake-up scheme that avoids awakening more nodes than necessary. In this paper, we implement an energy-efficient coverage control in cluster-based WSNs using a Memetic Algorithm (MA)-based approach, entitled CoCMA, to resolve the challenging problems. The CoCMA contains two optimization strategies: a MA-based schedule for sensor nodes and a wake-up scheme, which are responsible to prolong the network lifetime while maintaining coverage preservation. The MA-based schedule is applied to a given WSN to avoid unnecessary energy consumption caused by the redundant nodes. During the network operation, the wake-up scheme awakens sleeping sensor nodes to recover coverage hole caused by dead nodes. The performance evaluation of the proposed CoCMA was conducted on a cluster-based WSN (CWSN) under either a random or a uniform deployment of sensor nodes. Simulation results show that the performance yielded by the combination of MA and wake-up scheme is better than that in some existing approaches. Furthermore, CoCMA is able to activate fewer sensor nodes to monitor the required sensing area. PMID:22408561
A graph-based watershed merging using fuzzy C-means and simulated annealing for image segmentation
NASA Astrophysics Data System (ADS)
Vadiveloo, Mogana; Abdullah, Rosni; Rajeswari, Mandava
2015-12-01
In this paper, we have addressed the issue of over-segmented regions produced in watershed by merging the regions using global feature. The global feature information is obtained from clustering the image in its feature space using Fuzzy C-Means (FCM) clustering. The over-segmented regions produced by performing watershed on the gradient of the image are then mapped to this global information in the feature space. Further to this, the global feature information is optimized using Simulated Annealing (SA). The optimal global feature information is used to derive the similarity criterion to merge the over-segmented watershed regions which are represented by the region adjacency graph (RAG). The proposed method has been tested on digital brain phantom simulated dataset to segment white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) soft tissues regions. The experiments showed that the proposed method performs statistically better, with average of 95.242% regions are merged, than the immersion watershed and average accuracy improvement of 8.850% in comparison with RAG-based immersion watershed merging using global and local features.
NASA Technical Reports Server (NTRS)
Werth, L. F. (Principal Investigator)
1981-01-01
Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S P
2010-06-21
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Ar(n)(+2), Kr(n)(+2), and Xe(n)(+2), where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Ar(n)(2+), for example, fissionlike fragmentation is predicted for n=55 while for n=43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)]. PMID:20572686
NASA Astrophysics Data System (ADS)
Nandy, Subhajit; Chaudhury, Pinaki; Bhattacharyya, S. P.
2010-06-01
We present a genetic algorithm based investigation of structural fragmentation in dicationic noble gas clusters, Arn+2, Krn+2, and Xen+2, where n denotes the size of the cluster. Dications are predicted to be stable above a threshold size of the cluster when positive charges are assumed to remain localized on two noble gas atoms and the Lennard-Jones potential along with bare Coulomb and ion-induced dipole interactions are taken into account for describing the potential energy surface. Our cutoff values are close to those obtained experimentally [P. Scheier and T. D. Mark, J. Chem. Phys. 11, 3056 (1987)] and theoretically [J. G. Gay and B. J. Berne, Phys. Rev. Lett. 49, 194 (1982)]. When the charges are allowed to be equally distributed over four noble gas atoms in the cluster and the nonpolarization interaction terms are allowed to remain unchanged, our method successfully identifies the size threshold for stability as well as the nature of the channels of dissociation as function of cluster size. In Arn2+, for example, fissionlike fragmentation is predicted for n =55 while for n =43, the predicted outcome is nonfission fragmentation in complete agreement with earlier work [Golberg et al., J. Chem. Phys. 100, 8277 (1994)].
Wu, Shandong; Weinstein, Susan P.; Conant, Emily F.; Kontos, Despina
2013-12-15
Purpose: Breast magnetic resonance imaging (MRI) plays an important role in the clinical management of breast cancer. Studies suggest that the relative amount of fibroglandular (i.e., dense) tissue in the breast as quantified in MR images can be predictive of the risk for developing breast cancer, especially for high-risk women. Automated segmentation of the fibroglandular tissue and volumetric density estimation in breast MRI could therefore be useful for breast cancer risk assessment. Methods: In this work the authors develop and validate a fully automated segmentation algorithm, namely, an atlas-aided fuzzy C-means (FCM-Atlas) method, to estimate the volumetric amount of fibroglandular tissue in breast MRI. The FCM-Atlas is a 2D segmentation method working on a slice-by-slice basis. FCM clustering is first applied to the intensity space of each 2D MR slice to produce an initial voxelwise likelihood map of fibroglandular tissue. Then a prior learned fibroglandular tissue likelihood atlas is incorporated to refine the initial FCM likelihood map to achieve enhanced segmentation, from which the absolute volume of the fibroglandular tissue (|FGT|) and the relative amount (i.e., percentage) of the |FGT| relative to the whole breast volume (FGT%) are computed. The authors' method is evaluated by a representative dataset of 60 3D bilateral breast MRI scans (120 breasts) that span the full breast density range of the American College of Radiology Breast Imaging Reporting and Data System. The automated segmentation is compared to manual segmentation obtained by two experienced breast imaging radiologists. Segmentation performance is assessed by linear regression, Pearson's correlation coefficients, Student's pairedt-test, and Dice's similarity coefficients (DSC). Results: The inter-reader correlation is 0.97 for FGT% and 0.95 for |FGT|. When compared to the average of the two readers’ manual segmentation, the proposed FCM-Atlas method achieves a correlation ofr = 0
Not Available
1994-02-02
This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of this cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-01
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T). PMID:23927246
Srivastava, Subodh; Sharma, Neeraj; Singh, S K; Srivastava, R
2014-07-01
In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996
Srivastava, Subodh; Sharma, Neeraj; Singh, S. K.; Srivastava, R.
2014-01-01
In this paper, a combined approach for enhancement and segmentation of mammograms is proposed. In preprocessing stage, a contrast limited adaptive histogram equalization (CLAHE) method is applied to obtain the better contrast mammograms. After this, the proposed combined methods are applied. In the first step of the proposed approach, a two dimensional (2D) discrete wavelet transform (DWT) is applied to all the input images. In the second step, a proposed nonlinear complex diffusion based unsharp masking and crispening method is applied on the approximation coefficients of the wavelet transformed images to further highlight the abnormalities such as micro-calcifications, tumours, etc., to reduce the false positives (FPs). Thirdly, a modified fuzzy c-means (FCM) segmentation method is applied on the output of the second step. In the modified FCM method, the mutual information is proposed as a similarity measure in place of conventional Euclidian distance based dissimilarity measure for FCM segmentation. Finally, the inverse 2D-DWT is applied. The efficacy of the proposed unsharp masking and crispening method for image enhancement is evaluated in terms of signal-to-noise ratio (SNR) and that of the proposed segmentation method is evaluated in terms of random index (RI), global consistency error (GCE), and variation of information (VoI). The performance of the proposed segmentation approach is compared with the other commonly used segmentation approaches such as Otsu's thresholding, texture based, k-means, and FCM clustering as well as thresholding. From the obtained results, it is observed that the proposed segmentation approach performs better and takes lesser processing time in comparison to the standard FCM and other segmentation methods in consideration. PMID:25190996
Muster: Massively Scalable Clustering
2010-05-20
Muster is a framework for scalable cluster analysis. It includes implementations of classic K-Medoids partitioning algorithms, as well as infrastructure for making these algorithms run scalably on very large systems. In particular, Muster contains algorithms such as CAPEK (described in reference 1) that are capable of clustering highly distributed data sets in-place on a hundred thousand or more processes.
Cazade, Pierre-André; Berezovska, Ganna; Meuwly, Markus; Zheng, Wenwei; Clementi, Cecilia; Prada-Gracia, Diego; Rao, Francesco
2015-01-14
The ligand migration network for O{sub 2}–diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k–means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k–means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.
Naim, Iftekhar; Datta, Suprakash; Rebhahn, Jonathan; Cavenaugh, James S; Mosmann, Tim R; Sharma, Gaurav
2014-05-01
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems.
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177
Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi
2016-01-01
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Baldauf, Tobias; Smith, Robert E.; Seljak, Uros; Mandelbaum, Rachel
2010-03-15
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In order to extract the required information, one must relate the observable galaxy distribution to the underlying dark matter distribution. In this study we develop in detail a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. Our goal is to develop an estimator that maximally correlates the two. To generate a mock galaxy catalogue for testing purposes, we use the halo occupation distribution approach applied to a large ensemble of N-body simulations to model preexisting SDSS luminous red galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We develop a new statistic that suppresses the small-scale contributions to these observations and show that this new statistic leads to a cross-correlation coefficient that is within a few percent of unity down to 5h{sup -1} Mpc. Furthermore, the residual incoherence between the galaxy and matter fields can be explained using a theoretical model for scale-dependent galaxy bias, giving us a final estimator that is unbiased to within 1%, so that we can reconstruct the dark matter clustering power spectrum at this accuracy up to k{approx}1h Mpc{sup -1}. We also perform a comprehensive study of other physical effects that can affect the analysis, such as redshift space distortions and differences in radial windows between galaxy clustering and weak
Krejci, Adam; Hupp, Ted R.; Lexa, Matej; Vojtesek, Borivoj; Muller, Petr
2016-01-01
Motivation: Proteins often recognize their interaction partners on the basis of short linear motifs located in disordered regions on proteins’ surface. Experimental techniques that study such motifs use short peptides to mimic the structural properties of interacting proteins. Continued development of these methods allows for large-scale screening, resulting in vast amounts of peptide sequences, potentially containing information on multiple protein-protein interactions. Processing of such datasets is a complex but essential task for large-scale studies investigating protein-protein interactions. Results: The software tool presented in this article is able to rapidly identify multiple clusters of sequences carrying shared specificity motifs in massive datasets from various sources and generate multiple sequence alignments of identified clusters. The method was applied on a previously published smaller dataset containing distinct classes of ligands for SH3 domains, as well as on a new, an order of magnitude larger dataset containing epitopes for several monoclonal antibodies. The software successfully identified clusters of sequences mimicking epitopes of antibody targets, as well as secondary clusters revealing that the antibodies accept some deviations from original epitope sequences. Another test indicates that processing of even much larger datasets is computationally feasible. Availability and implementation: Hammock is published under GNU GPL v. 3 license and is freely available as a standalone program (from http://www.recamo.cz/en/software/hammock-cluster-peptides/) or as a tool for the Galaxy toolbox (from https://toolshed.g2.bx.psu.edu/view/hammock/hammock). The source code can be downloaded from https://github.com/hammock-dev/hammock/releases. Contact: muller@mou.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342231
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
NASA Astrophysics Data System (ADS)
Pei, Tao; Zhou, Cheng-Hu; Yang, Ming; Luo, Jian-Cheng; Li, Quan-Lin
2004-01-01
Aiming at the complexity of seismic gestation mechanism and spatial distribution, we hypothesize that the seismic data are composed of background earthquakes and anomaly earthquakes in a certain temporal-spatial scope. Also the background earthquakes and anomaly earthquakes both satisfy the 2-D Poisson process of different parameters respectively. In the paper, the concept of N-th order distance is introduced in order to transform 2-D superimposed Poisson process into 1-D mixture density function. On the basis of choosing the distance, mixture density function is decomposed to recognize the anomaly earthquakes through genetic algorithm. Combined with the temporal scanning of C value, the algorithm is applied to the recognition on spatial pattern of foreshock anomalies by examples of Songpan and Longling sequences in the southwest of China.
Ghorbanzadeh, Leila; Torshabi, Ahmad Esmaili; Nabipour, Jamshid Soltani; Arbatan, Moslem Ahmadi
2016-04-01
In image guided radiotherapy, in order to reach a prescribed uniform dose in dynamic tumors at thorax region while minimizing the amount of additional dose received by the surrounding healthy tissues, tumor motion must be tracked in real-time. Several correlation models have been proposed in recent years to provide tumor position information as a function of time in radiotherapy with external surrogates. However, developing an accurate correlation model is still a challenge. In this study, we proposed an adaptive neuro-fuzzy based correlation model that employs several data clustering algorithms for antecedent parameters construction to avoid over-fitting and to achieve an appropriate performance in tumor motion tracking compared with the conventional models. To begin, a comparative assessment is done between seven nuero-fuzzy correlation models each constructed using a unique data clustering algorithm. Then, each of the constructed models are combined within an adaptive sevenfold synthetic model since our tumor motion database has high degrees of variability and that each model has its intrinsic properties at motion tracking. In the proposed sevenfold synthetic model, best model is selected adaptively at pre-treatment. The model also updates the steps for each patient using an automatic model selectivity subroutine. We tested the efficacy of the proposed synthetic model on twenty patients (divided equally into two control and worst groups) treated with CyberKnife synchrony system. Compared to Cyberknife model, the proposed synthetic model resulted in 61.2% and 49.3% reduction in tumor tracking error in worst and control group, respectively. These results suggest that the proposed model selection program in our synthetic neuro-fuzzy model can significantly reduce tumor tracking errors. Numerical assessments confirmed that the proposed synthetic model is able to track tumor motion in real time with high accuracy during treatment. PMID:25765021
NASA Astrophysics Data System (ADS)
Turan, Muhammed K.; Sehirli, Eftal; Elen, Abdullah; Karas, Ismail R.
2015-07-01
Gel electrophoresis (GE) is one of the most used method to separate DNA, RNA, protein molecules according to size, weight and quantity parameters in many areas such as genetics, molecular biology, biochemistry, microbiology. The main way to separate each molecule is to find borders of each molecule fragment. This paper presents a software application that show columns edges of DNA fragments in 3 steps. In the first step the application obtains lane histograms of agarose gel electrophoresis images by doing projection based on x-axis. In the second step, it utilizes k-means clustering algorithm to classify point values of lane histogram such as left side values, right side values and undesired values. In the third step, column edges of DNA fragments is shown by using mean algorithm and mathematical processes to separate DNA fragments from the background in a fully automated way. In addition to this, the application presents locations of DNA fragments and how many DNA fragments exist on images captured by a scientific camera.
NASA Astrophysics Data System (ADS)
Huang, Xiaoming; Sai, Linwei; Jiang, Xue; Zhao, Jijun
2013-02-01
Employing genetic algorithm incorporated with density functional theory calculations we determined the lowest-energy structures of cationic Na n + clusters ( n = 9, 15, 21, 26, 31, 36, 41, 50 and 59). We revealed a transition of growth pattern from "polyicosahedral" sequence to the Mackay icosahedral motif at around n = 40. Based on the ground-state structures the size dependent electronic properties of Na n + clusters including the binding energies, HOMO-LUMO gaps, electron density of states and photoabsorption spectra were discussed. As cluster size increases, the HOMO-LUMO gap of Na n + cluster gradually reduces and converges to metallic behavior of bulk crystal rapidly. The photoabsorption spectra of Na n + clusters from our calculations agree with experimental data rather well, confirming the reliability of our theoretical approaches.
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong
2015-01-01
In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Dawson, Kevin; Rodriguez, Raymond L; Malyj, Wasyl
2005-01-01
Background Life processes are determined by the organism's genetic profile and multiple environmental variables. However the interaction between these factors is inherently non-linear [1]. Microarray data is one representation of the nonlinear interactions among genes and genes and environmental factors. Still most microarray studies use linear methods for the interpretation of nonlinear data. In this study, we apply Isomap, a nonlinear method of dimensionality reduction, to analyze three independent large Affymetrix high-density oligonucleotide microarray data sets. Results Isomap discovered low-dimensional structures embedded in the Affymetrix microarray data sets. These structures correspond to and help to interpret biological phenomena present in the data. This analysis provides examples of temporal, spatial, and functional processes revealed by the Isomap algorithm. In a spinal cord injury data set, Isomap discovers the three main modalities of the experiment – location and severity of the injury and the time elapsed after the injury. In a multiple tissue data set, Isomap discovers a low-dimensional structure that corresponds to anatomical locations of the source tissues. This model is capable of describing low- and high-resolution differences in the same model, such as kidney-vs.-brain and differences between the nuclei of the amygdala, respectively. In a high-throughput drug screening data set, Isomap discovers the monocytic and granulocytic differentiation of myeloid cells and maps several chemical compounds on the two-dimensional model. Conclusion Visualization of Isomap models provides useful tools for exploratory analysis of microarray data sets. In most instances, Isomap models explain more of the variance present in the microarray data than PCA or MDS. Finally, Isomap is a promising new algorithm for class discovery and class prediction in high-density oligonucleotide data sets. PMID:16076401
Kidney segmentation in CT sequences using SKFCM and improved GrowCut algorithm
2015-01-01
Background Organ segmentation is an important step in computer-aided diagnosis and pathology detection. Accurate kidney segmentation in abdominal computed tomography (CT) sequences is an essential and crucial task for surgical planning and navigation in kidney tumor ablation. However, kidney segmentation in CT is a substantially challenging work because the intensity values of kidney parenchyma are similar to those of adjacent structures. Results In this paper, a coarse-to-fine method was applied to segment kidney from CT images, which consists two stages including rough segmentation and refined segmentation. The rough segmentation is based on a kernel fuzzy C-means algorithm with spatial information (SKFCM) algorithm and the refined segmentation is implemented with improved GrowCut (IGC) algorithm. The SKFCM algorithm introduces a kernel function and spatial constraint into fuzzy c-means clustering (FCM) algorithm. The IGC algorithm makes good use of the continuity of CT sequences in space which can automatically generate the seed labels and improve the efficiency of segmentation. The experimental results performed on the whole dataset of abdominal CT images have shown that the proposed method is accurate and efficient. The method provides a sensitivity of 95.46% with specificity of 99.82% and performs better than other related methods. Conclusions Our method achieves high accuracy in kidney segmentation and considerably reduces the time and labor required for contour delineation. In addition, the method can be expanded to 3D segmentation directly without modification. PMID:26356850
Convex Discriminative Multitask Clustering.
Zhang, Xiao-Lei
2015-01-01
Multitask clustering tries to improve the clustering performance of multiple tasks simultaneously by taking their relationship into account. Most existing multitask clustering algorithms fall into the type of generative clustering, and none are formulated as convex optimization problems. In this paper, we propose two convex Discriminative Multitask Clustering (DMTC) objectives to address the problems. The first one aims to learn a shared feature representation, which can be seen as a technical combination of the convex multitask feature learning and the convex Multiclass Maximum Margin Clustering (M3C). The second one aims to learn the task relationship, which can be seen as a combination of the convex multitask relationship learning and M3C. The objectives of the two algorithms are solved in a uniform procedure by the efficient cutting-plane algorithm and further unified in the Bayesian framework. Experimental results on a toy problem and two benchmark data sets demonstrate the effectiveness of the proposed algorithms. PMID:26353206
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
NASA Astrophysics Data System (ADS)
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-01
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model
Douglass, Michael; Bezak, Eva; Penfold, Scott
2015-04-21
The preliminary framework of a combined radiobiological model is developed and calibrated in the current work. The model simulates the production of individual cells forming a tumour, the spatial distribution of individual ionization events (using Geant4-DNA) and the stochastic biochemical repair of DNA double strand breaks (DSBs) leading to the prediction of survival or death of individual cells. In the current work, we expand upon a previously developed tumour generation and irradiation model to include a stochastic ionization damage clustering and DNA lesion repair model. The Geant4 code enabled the positions of each ionization event in the cells to be simulated and recorded for analysis. An algorithm was developed to cluster the ionization events in each cell into simple and complex double strand breaks. The two lesion kinetic (TLK) model was then adapted to predict DSB repair kinetics and the resultant cell survival curve. The parameters in the cell survival model were then calibrated using experimental cell survival data of V79 cells after low energy proton irradiation. A monolayer of V79 cells was simulated using the tumour generation code developed previously. The cells were then irradiated by protons with mean energies of 0.76 MeV and 1.9 MeV using a customized version of Geant4. By replicating the experimental parameters of a low energy proton irradiation experiment and calibrating the model with two sets of data, the model is now capable of predicting V79 cell survival after low energy (<2 MeV) proton irradiation for a custom set of input parameters. The novelty of this model is the realistic cellular geometry which can be irradiated using Geant4-DNA and the method in which the double strand breaks are predicted from clustering the spatial distribution of ionisation events. Unlike the original TLK model which calculates a tumour average cell survival probability, the cell survival probability is calculated for each cell in the geometric tumour model
NASA Astrophysics Data System (ADS)
Komura, Yukihiro; Okabe, Yutaka
2014-03-01
We present sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. We deal with the classical spin models; the Ising model, the q-state Potts model, and the classical XY model. As for the lattice, both the 2D (square) lattice and the 3D (simple cubic) lattice are treated. We already reported the idea of the GPU implementation for 2D models (Komura and Okabe, 2012). We here explain the details of sample programs, and discuss the performance of the present GPU implementation for the 3D Ising and XY models. We also show the calculated results of the moment ratio for these models, and discuss phase transitions. Catalogue identifier: AERM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5632 No. of bytes in distributed program, including test data, etc.: 14688 Distribution format: tar.gz Programming language: C, CUDA. Computer: System with an NVIDIA CUDA enabled GPU. Operating system: System with an NVIDIA CUDA enabled GPU. Classification: 23. External routines: NVIDIA CUDA Toolkit 3.0 or newer Nature of problem: Monte Carlo simulation of classical spin systems. Ising, q-state Potts model, and the classical XY model are treated for both two-dimensional and three-dimensional lattices. Solution method: GPU-based Swendsen-Wang multi-cluster spin flip Monte Carlo method. The CUDA implementation for the cluster-labeling is based on the work by Hawick et al. [1] and that by Kalentev et al. [2]. Restrictions: The system size is limited depending on the memory of a GPU. Running time: For the parameters used in the sample programs, it takes about a minute for each program. Of course, it depends on the system size, the number of Monte Carlo steps, etc. References: [1] K
Adaptive Clustering of Hypermedia Documents.
ERIC Educational Resources Information Center
Johnson, Andrew; Fotouhi, Farshad
1996-01-01
Discussion of hypermedia systems focuses on a comparison of two types of adaptive algorithm (genetic algorithm and neural network) in clustering hypermedia documents. These clusters allow the user to index into the nodes to find needed information more quickly, since clustering is "personalized" based on the user's paths rather than representing…
Fuzzy technique for microcalcifications clustering in digital mammograms
2014-01-01
Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from target dimensions and to optimize the recognition efficiency. A clustering method, based on a Fuzzy C-means (FCM), has been developed. The described method, Fuzzy C-means with Features (FCM-WF), was tested on simulated clusters of microcalcifications, implying that the location of the cluster within the breast and the exact number of microcalcifications are known. The proposed method has been also tested on a set of images from the mini-Mammographic database provided by Mammographic Image Analysis Society (MIAS) publicly available. Results The comparison between FCM-WF and standard FCM algorithms, applied on both databases, shows that the former produces better microcalcifications associations for clustering than the latter: with respect to the private and the public database we had a performance improvement of 10% and 5% with regard to the Merit Figure and a 22% and a 10% of reduction of false positives potentially identified in the images, both to the benefit of the FCM-WF. The method was also evaluated in terms of Sensitivity (93% and 82%), Accuracy (95% and 94%), FP/image (4% for both database) and Precision (62% and 65%). Conclusions Thanks to the private database and to the informations contained in it regarding every single microcalcification, we tested the developed clustering
Matlab Cluster Ensemble Toolbox
Sapio, Vincent De; Kegelmeyer, Philip
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
Lu, Jing; Chen, Lei; Yin, Jun; Huang, Tao; Bi, Yi; Kong, Xiangyin; Zheng, Mingyue; Cai, Yu-Dong
2016-01-01
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and K-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer.
Delineation of river bed-surface patches by clustering high-resolution spatial grain size data
NASA Astrophysics Data System (ADS)
Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.
2014-01-01
The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien
2012-11-01
Formosat-2 image is a kind of high-spatial-resolution (2 meters GSD) remote sensing satellite data, which includes one panchromatic band and four multispectral bands (Blue, Green, Red, near-infrared). An essential sector in the daily processing of received Formosat-2 image is to estimate the cloud statistic of image using Automatic Cloud Coverage Assessment (ACCA) algorithm. The information of cloud statistic of image is subsequently recorded as an important metadata for image product catalog. In this paper, we propose an ACCA method with two consecutive stages: preprocessing and post-processing analysis. For pre-processing analysis, the un-supervised K-means classification, Sobel's method, thresholding method, non-cloudy pixels reexamination, and cross-band filter method are implemented in sequence for cloud statistic determination. For post-processing analysis, Box-Counting fractal method is implemented. In other words, the cloud statistic is firstly determined via pre-processing analysis, the correctness of cloud statistic of image of different spectral band is eventually cross-examined qualitatively and quantitatively via post-processing analysis. The selection of an appropriate thresholding method is very critical to the result of ACCA method. Therefore, in this work, We firstly conduct a series of experiments of the clustering-based and spatial thresholding methods that include Otsu's, Local Entropy(LE), Joint Entropy(JE), Global Entropy(GE), and Global Relative Entropy(GRE) method, for performance comparison. The result shows that Otsu's and GE methods both perform better than others for Formosat-2 image. Additionally, our proposed ACCA method by selecting Otsu's method as the threshoding method has successfully extracted the cloudy pixels of Formosat-2 image for accurate cloud statistic estimation.
NASA Astrophysics Data System (ADS)
Khateri, Parisa; Rad, Hamidreza Saligheh; Jafari, Amir Homayoun; Ay, Mohammad Reza
2014-01-01
Quantitative PET image reconstruction requires an accurate map of attenuation coefficients of the tissue under investigation at 511 keV (μ-map), and in order to correct the emission data for attenuation. The use of MRI-based attenuation correction (MRAC) has recently received lots of attention in the scientific literature. One of the major difficulties facing MRAC has been observed in the areas where bone and air collide, e.g. ethmoidal sinuses in the head area. Bone is intrinsically not detectable by conventional MRI, making it difficult to distinguish air from bone. Therefore, development of more versatile MR sequences to label the bone structure, e.g. ultra-short echo-time (UTE) sequences, certainly plays a significant role in novel methodological developments. However, long acquisition time and complexity of UTE sequences limit its clinical applications. To overcome this problem, we developed a novel combination of Short-TE (ShTE) pulse sequence to detect bone signal with a 2-point Dixon technique for water-fat discrimination, along with a robust image segmentation method based on fuzzy clustering C-means (FCM) to segment the head area into four classes of air, bone, soft tissue and adipose tissue. The imaging protocol was set on a clinical 3 T Tim Trio and also 1.5 T Avanto (Siemens Medical Solution, Erlangen, Germany) employing a triple echo time pulse sequence in the head area. The acquisition parameters were as follows: TE1/TE2/TE3=0.98/4.925/6.155 ms, TR=8 ms, FA=25 on the 3 T system, and TE1/TE2/TE3=1.1/2.38/4.76 ms, TR=16 ms, FA=18 for the 1.5 T system. The second and third echo-times belonged to the Dixon decomposition to distinguish soft and adipose tissues. To quantify accuracy, sensitivity and specificity of the bone segmentation algorithm, resulting classes of MR-based segmented bone were compared with the manual segmented one by our expert neuro-radiologist. Results for both 3 T and 1.5 T systems show that bone segmentation applied in several
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.
1992-01-01
Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
A Linear Algebra Measure of Cluster Quality.
ERIC Educational Resources Information Center
Mather, Laura A.
2000-01-01
Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…
Weigend, Florian
2014-10-01
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf12 and [LaPb7Bi7](4-). For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the "pure" genetic algorithm.
Weigend, Florian
2014-10-07
Energy surfaces of metal clusters usually show a large variety of local minima. For homo-metallic species the energetically lowest can be found reliably with genetic algorithms, in combination with density functional theory without system-specific parameters. For mixed-metallic clusters this is much more difficult, as for a given arrangement of nuclei one has to find additionally the best of many possibilities of assigning different metal types to the individual positions. In the framework of electronic structure methods this second issue is treatable at comparably low cost at least for elements with similar atomic number by means of first-order perturbation theory, as shown previously [F. Weigend, C. Schrodt, and R. Ahlrichs, J. Chem. Phys. 121, 10380 (2004)]. In the present contribution the extension of a genetic algorithm with the re-assignment of atom types to atom sites is proposed and tested for the search of the global minima of PtHf{sub 12} and [LaPb{sub 7}Bi{sub 7}]{sup 4−}. For both cases the (putative) global minimum is reliably found with the extended technique, which is not the case for the “pure” genetic algorithm.
Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering.
Gong, Maoguo; Zhou, Zhiqiang; Ma, Jingjing
2012-04-01
This paper presents an unsupervised distribution-free change detection approach for synthetic aperture radar (SAR) images based on an image fusion strategy and a novel fuzzy clustering algorithm. The image fusion technique is introduced to generate a difference image by using complementary information from a mean-ratio image and a log-ratio image. In order to restrain the background information and enhance the information of changed regions in the fused difference image, wavelet fusion rules based on an average operator and minimum local area energy are chosen to fuse the wavelet coefficients for a low-frequency band and a high-frequency band, respectively. A reformulated fuzzy local-information C-means clustering algorithm is proposed for classifying changed and unchanged regions in the fused difference image. It incorporates the information about spatial context in a novel fuzzy way for the purpose of enhancing the changed information and of reducing the effect of speckle noise. Experiments on real SAR images show that the image fusion strategy integrates the advantages of the log-ratio operator and the mean-ratio operator and gains a better performance. The change detection results obtained by the improved fuzzy clustering algorithm exhibited lower error than its preexistences.
Unconventional methods for clustering
NASA Astrophysics Data System (ADS)
Kotyrba, Martin
2016-06-01
Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.
Brightest Cluster Galaxy Identification
NASA Astrophysics Data System (ADS)
Leisman, Luke; Haarsma, D. B.; Sebald, D. A.; ACCEPT Team
2011-01-01
Brightest cluster galaxies (BCGs) play an important role in several fields of astronomical research. The literature includes many different methods and criteria for identifying the BCG in the cluster, such as choosing the brightest galaxy, the galaxy nearest the X-ray peak, or the galaxy with the most extended profile. Here we examine a sample of 75 clusters from the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) and the Sloan Digital Sky Survey (SDSS), measuring masked magnitudes and profiles for BCG candidates in each cluster. We first identified galaxies by hand; in 15% of clusters at least one team member selected a different galaxy than the others.We also applied 6 other identification methods to the ACCEPT sample; in 30% of clusters at least one of these methods selected a different galaxy than the other methods. We then developed an algorithm that weighs brightness, profile, and proximity to the X-ray peak and centroid. This algorithm incorporates the advantages of by-hand identification (weighing multiple properties) and automated selection (repeatable and consistent). The BCG population chosen by the algorithm is more uniform in its properties than populations selected by other methods, particularly in the relation between absolute magnitude (a proxy for galaxy mass) and average gas temperature (a proxy for cluster mass). This work supported by a Barry M. Goldwater Scholarship and a Sid Jansma Summer Research Fellowship.
A diabetic retinopathy detection method using an improved pillar K-means algorithm.
Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa
2014-01-01
The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps.
Wang, Dianhui; Tapan, Sarwar
2013-10-01
It is important to identify DNA motifs in promoter regions to understand the mechanism of gene regulation. Computational approaches for finding DNA motifs are well recognized as useful tools to biologists, which greatly help in saving experimental time and cost in wet laboratories. Self-organizing maps (SOMs), as a powerful clustering tool, have demonstrated good potential for problem solving. However, the current SOM-based motif discovery algorithms unfairly treat data samples lying around the cluster boundaries by assigning them to one of the nodes, which may result in unreliable system performance. This paper aims to develop a robust framework for discovering DNA motifs, where fuzzy SOMs, with an integration of fuzzy c-means membership functions and a standard batch-learning scheme, are employed to extract putative motifs with varying length in a recursive manner. Experimental results on eight real datasets show that our proposed algorithm outperforms the other searching tools such as SOMBRERO, SOMEA, MEME, AlignACE, and WEEDER in terms of the F-measure and algorithm reliability. It is observed that a remarkable 24.6% improvement can be achieved compared to the state-of-the-art SOMBRERO. Furthermore, our algorithm can produce a 20% and 6.6% improvement over SOMBRERO and SOMEA, respectively, in finding multiple motifs on five artificial datasets. PMID:24808603
Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering
Vijendra, Singh; Laxman, Sahoo
2015-01-01
We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomized K dimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms. PMID:26339233
Elaff, Ihab
2016-01-01
Background Brain segmentation from diffusion tensor imaging (DTI) into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) with acceptable results is subjected to many factors. Objectives The most important issue in brain segmentation from DTI images is the selection of suitable scalar indices that best describe the required tissue in the images. Specifying suitable clustering method and suitable number of clusters of the selected method are other factors which affects the segmentation process significantly. Materials and Methods The segmentation process is evaluated using four different clustering methods with different number of clusters where some DTI scalar indices for 10 human brains are processed. Results The aim was to produce results with less segmentation error and a lower computational cost while attempting to minimizing boundary overlapping and minimizing the effect of artifacts due to macroscale scanning. Conclusion The volume ratios of the best produced outputs with respect to the total brain size are 16.7% ± 3.53% for CSF, 35.05% ± 1.13% for WM, and 48.2% ± 2.88% for GM. PMID:27703655
Slonim, Noam; Atwal, Gurinder Singh; Tkačik, Gašper; Bialek, William
2005-01-01
In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here, we reformulate the clustering problem from an information theoretic perspective that avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster “prototype,” does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures nonlinear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures. PMID:16352721
Metamodel-based global optimization using fuzzy clustering for design space reduction
NASA Astrophysics Data System (ADS)
Li, Yulin; Liu, Li; Long, Teng; Dong, Weili
2013-09-01
High fidelity analysis are utilized in modern engineering design optimization problems which involve expensive black-box models. For computation-intensive engineering design problems, efficient global optimization methods must be developed to relieve the computational burden. A new metamodel-based global optimization method using fuzzy clustering for design space reduction (MGO-FCR) is presented. The uniformly distributed initial sample points are generated by Latin hypercube design to construct the radial basis function metamodel, whose accuracy is improved with increasing number of sample points gradually. Fuzzy c-mean method and Gath-Geva clustering method are applied to divide the design space into several small interesting cluster spaces for low and high dimensional problems respectively. Modeling efficiency and accuracy are directly related to the design space, so unconcerned spaces are eliminated by the proposed reduction principle and two pseudo reduction algorithms. The reduction principle is developed to determine whether the current design space should be reduced and which space is eliminated. The first pseudo reduction algorithm improves the speed of clustering, while the second pseudo reduction algorithm ensures the design space to be reduced. Through several numerical benchmark functions, comparative studies with adaptive response surface method, approximated unimodal region elimination method and mode-pursuing sampling are carried out. The optimization results reveal that this method captures the real global optimum for all the numerical benchmark functions. And the number of function evaluations show that the efficiency of this method is favorable especially for high dimensional problems. Based on this global design optimization method, a design optimization of a lifting surface in high speed flow is carried out and this method saves about 10 h compared with genetic algorithms. This method possesses favorable performance on efficiency, robustness
Bayesian Decision Theoretical Framework for Clustering
ERIC Educational Resources Information Center
Chen, Mo
2011-01-01
In this thesis, we establish a novel probabilistic framework for the data clustering problem from the perspective of Bayesian decision theory. The Bayesian decision theory view justifies the important questions: what is a cluster and what a clustering algorithm should optimize. We prove that the spectral clustering (to be specific, the…
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
Matlab Cluster Ensemble Toolbox
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. Withmore » regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.« less
Khormali, Aminollah; Addeh, Jalil
2016-07-01
Unnatural patterns in the control charts can be associated with a specific set of assignable causes for process variation. Hence, pattern recognition is very useful in identifying the process problems. In this study, a multiclass SVM (SVM) based classifier is proposed because of the promising generalization capability of support vector machines. In the proposed method type-2 fuzzy c-means (T2FCM) clustering algorithm is used to make a SVM system more effective. The fuzzy support vector machine classifier suggested in this paper is composed of three main sub-networks: fuzzy classifier sub-network, SVM sub-network and optimization sub-network. In SVM training, the hyper-parameters plays a very important role in its recognition accuracy. Therefore, cuckoo optimization algorithm (COA) is proposed for selecting appropriate parameters of the classifier. Simulation results showed that the proposed system has very high recognition accuracy. PMID:27101724
Color segmentation using MDL clustering
NASA Astrophysics Data System (ADS)
Wallace, Richard S.; Suenaga, Yasuhito
1991-02-01
This paper describes a procedure for segmentation of color face images. A cluster analysis algorithm uses a subsample of the input image color pixels to detect clusters in color space. The clustering program consists of two parts. The first part searches for a hierarchical clustering using the NIHC algorithm. The second part searches the resultant cluster tree for a level clustering having minimum description length (MDL). One of the primary advantages of the MDL paradigm is that it enables writing robust vision algorithms that do not depend on user-specified threshold parameters or other " magic numbers. " This technical note describes an application of minimal length encoding in the analysis of digitized human face images at the NTT Human Interface Laboratories. We use MDL clustering to segment color images of human faces. For color segmentation we search for clusters in color space. Using only a subsample of points from the original face image our clustering program detects color clusters corresponding to the hair skin and background regions in the image. Then a maximum likelyhood classifier assigns the remaining pixels to each class. The clustering program tends to group small facial features such as the nostrils mouth and eyes together but they can be separated from the larger classes through connected components analysis.
NASA Astrophysics Data System (ADS)
Liu, Hanli; Pei, Tao; Zhou, Chenghu; Zhu, A.-Xing
2008-12-01
In order to enhance the spectral characteristics of features for clustering, in the experiment of wetland extraction in Sanjiang Plain, we use a series of approaches in preprocessing of the MODIS remote sensing data by considering eliminating interference caused by other features. First, by analysis of the spectral characteristics of data, we choose a set of multi-temporal and multi-spectral MODIS data in Sanjiang Plain for clustering. By building and applying mask, the water areas and woodland vegetation can be eliminated from the image data. Second, by Enhanced Lee filtering and Minimum Noise Fraction (MNF) transformation, the data can be denoised and the characteristics of wetland can be enhanced obviously. After the preprocessing of data, the fuzzy c-means clustering algorithm optimized by particle swarm algorithm (PSO-FCM) is utilized on the image data for the wetland extraction. The result of experiment shows that the accuracy of wetland extraction by means of PSO-FCM algorithm is reasonable and effective.
Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping
NASA Astrophysics Data System (ADS)
Tamiminia, H.; Homayouni, S.; Safari, A.
2015-12-01
Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks. PMID:26357321
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Swarm Intelligence in Text Document Clustering
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.
Image clustering using fuzzy graph theory
NASA Astrophysics Data System (ADS)
Jafarkhani, Hamid; Tarokh, Vahid
1999-12-01
We propose an image clustering algorithm which uses fuzzy graph theory. First, we define a fuzzy graph and the concept of connectivity for a fuzzy graph. Then, based on our definition of connectivity we propose an algorithm which finds connected subgraphs of the original fuzzy graph. Each connected subgraph can be considered as a cluster. As an application of our algorithm, we consider a database of images. We calculate a similarity measure between any paris of images in the database and generate the corresponding fuzzy graph. The, we find the subgraphs of the resulting fuzzy graph using our algorithm. Each subgraph corresponds to a cluster. We apply our image clustering algorithm to the key frames of news programs to find the anchorperson clusters. Simulation results show that our algorithm is successful to find most of anchorperson frames from the database.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025
NASA Astrophysics Data System (ADS)
Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei
2015-12-01
Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.
Grande, J A; Andújar, J M; Aroba, J; de la Torre, M L; Beltrán, R
2005-04-01
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity.
Grande, J A; Andújar, J M; Aroba, J; de la Torre, M L; Beltrán, R
2005-04-01
In the present work, Acid Mine Drainage (AMD) processes in the Chorrito Stream, which flows into the Cobica River (Iberian Pyrite Belt, Southwest Spain) are characterized by means of clustering techniques based on fuzzy logic. Also, pH behavior in contrast to precipitation is clearly explained, proving that the influence of rainfall inputs on the acidity and, as a result, on the metal load of a riverbed undergoing AMD processes highly depends on the moment when it occurs. In general, the riverbed dynamic behavior is the response to the sum of instant stimuli produced by isolated rainfall, the seasonal memory depending on the moment of the target hydrological year and, finally, the own inertia of the river basin, as a result of an accumulation process caused by age-long mining activity. PMID:15798799
Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan
2014-01-01
Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442
Time series clustering analysis of health-promoting behavior
NASA Astrophysics Data System (ADS)
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Supervised clustering of genes
Dettling, Marcel; Bühlmann, Peter
2002-01-01
Background We focus on microarray data where experiments monitor gene expression in different tissues and where each experiment is equipped with an additional response variable such as a cancer type. Although the number of measured genes is in the thousands, it is assumed that only a few marker components of gene subsets determine the type of a tissue. Here we present a new method for finding such groups of genes by directly incorporating the response variables into the grouping process, yielding a supervised clustering algorithm for genes. Results An empirical study on eight publicly available microarray datasets shows that our algorithm identifies gene clusters with excellent predictive potential, often superior to classification with state-of-the-art methods based on single genes. Permutation tests and bootstrapping provide evidence that the output is reasonably stable and more than a noise artifact. Conclusions In contrast to other methods such as hierarchical clustering, our algorithm identifies several gene clusters whose expression levels clearly distinguish the different tissue types. The identification of such gene clusters is potentially useful for medical diagnostics and may at the same time reveal insights into functional genomics. PMID:12537558
Muetterties, Earl L.
1980-05-01
Metal cluster chemistry is one of the most rapidly developing areas of inorganic and organometallic chemistry. Prior to 1960 only a few metal clusters were well characterized. However, shortly after the early development of boron cluster chemistry, the field of metal cluster chemistry began to grow at a very rapid rate and a structural and a qualitative theoretical understanding of clusters came quickly. Analyzed here is the chemistry and the general significance of clusters with particular emphasis on the cluster research within my group. The importance of coordinately unsaturated, very reactive metal clusters is the major subject of discussion.
Ghaheri, Salehe; Masoum, Saeed; Gholami, Ali
2016-01-15
Analysis of fragrance composition is very important for both the fragrance producers and consumers. Unraveling of fragrance formulation is necessary for quality control, competitor and trace analysis. Gas chromatography-mass spectrometry (GC-MS) has been introduced as the most appropriate analytical technique for this type of analysis, which is based on Kovats index and MS database. The most straightforward method to analyze a GC-MS dataset is to integrate those peaks that can be recognized by their mass profiles. But, because of common problems of chromatographic data such as spectral background, baseline offset and specially overlapped peaks, accurate quantitative and qualitative analysis could be failed. Some chemometric modeling techniques such as bilinear multivariate curve resolution (MCR) methods have been introduced to overcome these problems and obtained well resolved chromatographic profiles. The main drawback of these methods is rotational ambiguity or nonunique solution that is represented as area of feasible solutions (AFS). Polygonal inflation algorithm (PIA) is an automatic and simple to use algorithm for numerical computation of AFS. In this study, the extent of rotational ambiguity in curve resolution methods is calculated by MCR-BAND toolbox and the PIA. The ability of the PIA in resolving GC-MS data sets is evaluated by simulated GC-MS data in comparison with other popular curve resolution methods such as multivariate curve resolution alternative least square (MCR-ALS), multivariate curve resolution objective function minimization (MCR-FMIN) by different initial estimation methods and independent component analysis (ICA). In addition, two typical challenging area of total ion chromatogram (TIC) of commercial fragrances with overlapped peaks were analyzed by the PIA to investigate the possibility of peak deconvolution analysis. PMID:26711156
Ghaheri, Salehe; Masoum, Saeed; Gholami, Ali
2016-01-15
Analysis of fragrance composition is very important for both the fragrance producers and consumers. Unraveling of fragrance formulation is necessary for quality control, competitor and trace analysis. Gas chromatography-mass spectrometry (GC-MS) has been introduced as the most appropriate analytical technique for this type of analysis, which is based on Kovats index and MS database. The most straightforward method to analyze a GC-MS dataset is to integrate those peaks that can be recognized by their mass profiles. But, because of common problems of chromatographic data such as spectral background, baseline offset and specially overlapped peaks, accurate quantitative and qualitative analysis could be failed. Some chemometric modeling techniques such as bilinear multivariate curve resolution (MCR) methods have been introduced to overcome these problems and obtained well resolved chromatographic profiles. The main drawback of these methods is rotational ambiguity or nonunique solution that is represented as area of feasible solutions (AFS). Polygonal inflation algorithm (PIA) is an automatic and simple to use algorithm for numerical computation of AFS. In this study, the extent of rotational ambiguity in curve resolution methods is calculated by MCR-BAND toolbox and the PIA. The ability of the PIA in resolving GC-MS data sets is evaluated by simulated GC-MS data in comparison with other popular curve resolution methods such as multivariate curve resolution alternative least square (MCR-ALS), multivariate curve resolution objective function minimization (MCR-FMIN) by different initial estimation methods and independent component analysis (ICA). In addition, two typical challenging area of total ion chromatogram (TIC) of commercial fragrances with overlapped peaks were analyzed by the PIA to investigate the possibility of peak deconvolution analysis.
Toward Parallel Document Clustering
Mogill, Jace A.; Haglin, David J.
2011-09-01
A key challenge to automated clustering of documents in large text corpora is the high cost of comparing documents in a multimillion dimensional document space. The Anchors Hierarchy is a fast data structure and algorithm for localizing data based on a triangle inequality obeying distance metric, the algorithm strives to minimize the number of distance calculations needed to cluster the documents into “anchors” around reference documents called “pivots”. We extend the original algorithm to increase the amount of available parallelism and consider two implementations: a complex data structure which affords efficient searching, and a simple data structure which requires repeated sorting. The sorting implementation is integrated with a text corpora “Bag of Words” program and initial performance results of end-to-end a document processing workflow are reported.
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
NASA Astrophysics Data System (ADS)
Gregory, Kyle J.; Hill, Joanne E.; Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-05-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photo- electron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
A hybrid skull-stripping algorithm based on adaptive balloon snake models
NASA Astrophysics Data System (ADS)
Liu, Hung-Ting; Sheu, Tony W. H.; Chang, Herng-Hua
2013-02-01
Skull-stripping is one of the most important preprocessing steps in neuroimage analysis. We proposed a hybrid algorithm based on an adaptive balloon snake model to handle this challenging task. The proposed framework consists of two stages: first, the fuzzy possibilistic c-means (FPCM) is used for voxel clustering, which provides a labeled image for the snake contour initialization. In the second stage, the contour is initialized outside the brain surface based on the FPCM result and evolves under the guidance of the balloon snake model, which drives the contour with an adaptive inward normal force to capture the boundary of the brain. The similarity indices indicate that our method outperformed the BSE and BET methods in skull-stripping the MR image volumes in the IBSR data set. Experimental results show the effectiveness of this new scheme and potential applications in a wide variety of skull-stripping applications.
The fuzzy C spherical shells algorithm - A new approach
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Nasraoui, Olfa; Frigui, Hichem
1992-01-01
The fuzzy c spherical shells (FCSS) algorithm is specially designed to search for clusters that can be described by circular arcs or, more generally, by shells of hyperspheres. In this paper, a new approach to the FCSS algorithm is presented. This algorithm is computationally and implementationally simpler than other clustering algorithms that have been suggested for this purpose. An unsupervised algorithm which automatically finds the optimum number of clusters is also proposed. This algorithm can be used when the number of clusters is not known. It uses a cluster validity measure to identify good clusters, merges all compatible clusters, and eliminates spurious clusters to achieve the final result. Experimental results on several data sets are presented.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively.
Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi
2015-01-01
This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549
Fully Automated Complementary DNA Microarray Segmentation using a Novel Fuzzy-based Algorithm.
Saberkari, Hamidreza; Bahrami, Sheyda; Shamsi, Mousa; Amoshahy, Mohammad Javad; Ghavifekr, Habib Badri; Sedaaghi, Mohammad Hossein
2015-01-01
DNA microarray is a powerful approach to study simultaneously, the expression of 1000 of genes in a single experiment. The average value of the fluorescent intensity could be calculated in a microarray experiment. The calculated intensity values are very close in amount to the levels of expression of a particular gene. However, determining the appropriate position of every spot in microarray images is a main challenge, which leads to the accurate classification of normal and abnormal (cancer) cells. In this paper, first a preprocessing approach is performed to eliminate the noise and artifacts available in microarray cells using the nonlinear anisotropic diffusion filtering method. Then, the coordinate center of each spot is positioned utilizing the mathematical morphology operations. Finally, the position of each spot is exactly determined through applying a novel hybrid model based on the principle component analysis and the spatial fuzzy c-means clustering (SFCM) algorithm. Using a Gaussian kernel in SFCM algorithm will lead to improving the quality in complementary DNA microarray segmentation. The performance of the proposed algorithm has been evaluated on the real microarray images, which is available in Stanford Microarray Databases. Results illustrate that the accuracy of microarray cells segmentation in the proposed algorithm reaches to 100% and 98% for noiseless/noisy cells, respectively.
Structural Trends of Small Silicon Clusters
NASA Astrophysics Data System (ADS)
Ho, K. M.; Pan, B. C.; Wacker, J. G.; Wang, C. Z.; Turner, D. E.; Deaven, D.
1997-03-01
We have performed a systematic search for the low energy structures of silicon clusters in the range from Si_10 to Si_20 using a recently developed genetic algorithm. Our results revealed the structural motif for the elongated clusters observed in mobility experiments. We also observe the beginning of another competing family for clusters larger than Si_17.
A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; Gerdes, David; Johnston, David E.; Sheldon, Erin; /Brookhaven
2011-08-22
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A GMBCG galaxy cluster catalog of 55,880 rich clusters from SDSS DR7
Hao, Jiangang; McKay, Timothy A.; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Annis, James; Wechsler, Risa H.; Evrard, August; Siegel, Seth R.; Becker, Matthew; Busha, Michael; /Fermilab /Michigan U. /Chicago U., Astron. Astrophys. Ctr. /UC, Santa Barbara /KICP, Chicago /KIPAC, Menlo Park /SLAC /Caltech /Brookhaven
2010-08-01
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red sequence plus Brightest Cluster Galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 square degrees of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
NASA Astrophysics Data System (ADS)
Xu, Changhang; Xie, Jing; Chen, Guoming; Huang, Weiping
2014-11-01
Infrared thermography has been used increasingly as an effective non-destructive technique to detect cracks on metal surface. Due to many factors, infrared thermal image has low definition compared to visible image. The contrasts between cracks and sound areas in different thermal image frames of a specimen vary greatly with the recorded time. An accurate detection can only be obtained by glancing over the whole thermal video, which is a laborious work. Moreover, experience of the operator has a great important influence on the accuracy of detection result. In this paper, an infrared thermal image processing framework based on superpixel algorithm is proposed to accomplish crack detection automatically. Two popular superpixel algorithms are compared and one of them is selected to generate superpixels in this application. Combined features of superpixels were selected from both the raw gray level image and the high-pass filtered image. Fuzzy c-means clustering is used to cluster superpixels in order to segment infrared thermal image. Experimental results show that the proposed framework can recognize cracks on metal surface through infrared thermal image automatically.
Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache; Vascular headache - cluster ... be related to the body's sudden release of histamine (chemical in the body released during an allergic ...
Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2004-05-26
We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.
NASA Astrophysics Data System (ADS)
Katgert, P.; Murdin, P.
2000-11-01
Abell clusters are the most conspicuous groupings of galaxies identified by George Abell on the plates of the first photographic survey made with the SCHMIDT TELESCOPE at Mount Palomar in the 1950s. Sometimes, the term Abell clusters is used as a synonym of nearby, optically selected galaxy clusters....
Knowledge based cluster ensemble for cancer discovery from biomolecular data.
Yu, Zhiwen; Wongb, Hau-San; You, Jane; Yang, Qinmin; Liao, Hongying
2011-06-01
The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.
A compilation of jet finding algorithms
Flaugher, B.; Meier, K.
1992-12-31
Technical descriptions of jet finding algorithms currently in use in p{anti p} collider experiments (CDF, UA1, UA2), e{sup +}e{sup {minus}} experiments and Monte-Carlo event generators (LUND programs, ISAJET) have been collected. For the hadron collider experiments, the clustering methods fall into two categories: cone algorithms and nearest-neighbor algorithms. In addition, UA2 has employed a combination of both methods for some analysis. While there are clearly differences between the cone and nearest-neighbor algorithms, the authors have found that there are also differences among the cone algorithms in the details of how the centroid of a cone cluster is located and how the E{sub T} and P{sub T} of the jet are defined. The most commonly used jet algorithm in electron-positron experiments is the JADE-type cluster algorithm. Five various incarnations of this approach have been described.
Discriminative clustering via extreme learning machine.
Huang, Gao; Liu, Tianchi; Yang, Yan; Lin, Zhiping; Song, Shiji; Wu, Cheng
2015-10-01
Discriminative clustering is an unsupervised learning framework which introduces the discriminative learning rule of supervised classification into clustering. The underlying assumption is that a good partition (clustering) of the data should yield high discrimination, namely, the partitioned data can be easily classified by some classification algorithms. In this paper, we propose three discriminative clustering approaches based on Extreme Learning Machine (ELM). The first algorithm iteratively trains weighted ELM (W-ELM) classifier to gradually maximize the data discrimination. The second and third methods are both built on Fisher's Linear Discriminant Analysis (LDA); but one approach adopts alternative optimization, while the other leverages kernel k-means. We show that the proposed algorithms can be easily implemented, and yield competitive clustering accuracy on real world data sets compared to state-of-the-art clustering methods. PMID:26143036
Image segmentation using fuzzy LVQ clustering networks
NASA Technical Reports Server (NTRS)
Tsao, Eric Chen-Kuo; Bezdek, James C.; Pal, Nikhil R.
1992-01-01
In this note we formulate image segmentation as a clustering problem. Feature vectors extracted from a raw image are clustered into subregions, thereby segmenting the image. A fuzzy generalization of a Kohonen learning vector quantization (LVQ) which integrates the Fuzzy c-Means (FCM) model with the learning rate and updating strategies of the LVQ is used for this task. This network, which segments images in an unsupervised manner, is thus related to the FCM optimization problem. Numerical examples on photographic and magnetic resonance images are given to illustrate this approach to image segmentation.
Detecting alternative graph clusterings.
Mandala, Supreet; Kumara, Soundar; Yao, Tao
2012-07-01
The problem of graph clustering or community detection has enjoyed a lot of attention in complex networks literature. A quality function, modularity, quantifies the strength of clustering and on maximization yields sensible partitions. However, in most real world networks, there are an exponentially large number of near-optimal partitions with some being very different from each other. Therefore, picking an optimal clustering among the alternatives does not provide complete information about network topology. To tackle this problem, we propose a graph perturbation scheme which can be used to identify an ensemble of near-optimal and diverse clusterings. We establish analytical properties of modularity function under the perturbation which ensures diversity. Our approach is algorithm independent and therefore can leverage any of the existing modularity maximizing algorithms. We numerically show that our methodology can systematically identify very different partitions on several existing data sets. The knowledge of diverse partitions sheds more light into the topological organization and helps gain a more complete understanding of the underlying complex network.
Partially supervised speaker clustering.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
2012-05-01
model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance. PMID:21844626
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Clustering of financial time series
NASA Astrophysics Data System (ADS)
D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo
2013-05-01
This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Analyzing geographic clustered response
Merrill, D.W.; Selvin, S.; Mohr, M.S.
1991-08-01
In the study of geographic disease clusters, an alternative to traditional methods based on rates is to analyze case locations on a transformed map in which population density is everywhere equal. Although the analyst's task is thereby simplified, the specification of the density equalizing map projection (DEMP) itself is not simple and continues to be the subject of considerable research. Here a new DEMP algorithm is described, which avoids some of the difficulties of earlier approaches. The new algorithm (a) avoids illegal overlapping of transformed polygons; (b) finds the unique solution that minimizes map distortion; (c) provides constant magnification over each map polygon; (d) defines a continuous transformation over the entire map domain; (e) defines an inverse transformation; (f) can accept optional constraints such as fixed boundaries; and (g) can use commercially supported minimization software. Work is continuing to improve computing efficiency and improve the algorithm. 21 refs., 15 figs., 2 tabs.
Space Structure and Clustering of Categorical Data.
Qian, Yuhua; Li, Feijiang; Liang, Jiye; Liu, Bing; Dang, Chuangyin
2016-10-01
Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which k -modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the k -modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the k -modes-type algorithms.
Space Structure and Clustering of Categorical Data.
Qian, Yuhua; Li, Feijiang; Liang, Jiye; Liu, Bing; Dang, Chuangyin
2016-10-01
Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which k -modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the k -modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the k -modes-type algorithms. PMID:26441455
AMIC@: All MIcroarray Clusterings @ once.
Geraci, Filippo; Pellegrini, Marco; Renda, M Elena
2008-07-01
The AMIC@ Web Server offers a light-weight multi-method clustering engine for microarray gene-expression data. AMIC@ is a highly interactive tool that stresses user-friendliness and robustness by adopting AJAX technology, thus allowing an effective interleaved execution of different clustering algorithms and inspection of results. Among the salient features AMIC@ offers, there are: (i) automatic file format detection, (ii) suggestions on the number of clusters using a variant of the stability-based method of Tibshirani et al. (iii) intuitive visual inspection of the data via heatmaps and (iv) measurements of the clustering quality using cluster homogeneity. Large data sets can be processed efficiently by selecting algorithms (such as FPF-SB and k-Boost), specifically designed for this purpose. In case of very large data sets, the user can opt for a batch-mode use of the system by means of the Clustering wizard that runs all algorithms at once and delivers the results via email. AMIC@ is freely available and open to all users with no login requirement at the following URL http://bioalgo.iit.cnr.it/amica.
Systolic architecture for heirarchical clustering
Ku, L.C.
1984-01-01
Several hierarchical clustering methods (including single-linkage complete-linkage, centroid, and absolute overlap methods) are reviewed. The absolute overlap clustering method is selected for the design of systolic architecture mainly due to its simplicity. Two versions of systolic architectures for the absolute overlap hierarchical clustering algorithm are proposed: one-dimensional version that leads to the development of a two dimensional version which fully takes advantage of the underlying data structure of the problems. The two dimensional systolic architecture can achieve a time complexity of O(m + n) in comparison with the conventional computer implementation of a time complexity of O(m/sup 2*/n).
Dhane, Dhiraj Manohar; Krishna, Vishal; Achar, Arun; Bar, Chittaranjan; Sanyal, Kunal; Chakraborty, Chandan
2016-09-01
Chronic lower extremity wound is a complicated disease condition of localized injury to skin and its tissues which have plagued many elders worldwide. The ulcer assessment and management is expensive and is burden on health establishment. Currently accurate wound evaluation remains a tedious task as it rely on visual inspection. This paper propose a new method for wound-area detection, using images digitally captured by a hand-held, optical camera. The strategy proposed involves spectral approach for clustering, based on the affinity matrix. The spectral clustering (SC) involves construction of similarity matrix of Laplacian based on Ng-Jorden-Weiss algorithm. Starting with a quadratic method, wound photographs were pre-processed for color homogenization. The first-order statistics filter was then applied to extract spurious regions. The filter was selected based on the performance, evaluated on four quality metrics. Then, the spectral method was used on the filtered images for effective segmentation. The segmented regions were post-processed using morphological operators. The performance of spectral segmentation was confirmed by ground-truth pictures labeled by dermatologists. The SC results were additionally compared with the results of k-means and Fuzzy C-Means (FCM) clustering algorithms. The SC approach on a set of 105 images, effectively delineated targeted wound beds yielding a segmentation accuracy of 86.73 %, positive predictive values of 91.80 %, and a sensitivity of 89.54 %. This approach shows the robustness of tool for ulcer perimeter measurement and healing progression. The article elucidates its potential to be incorporated in patient facing medical systems targeting a rapid clinical assistance. PMID:27520612
Feature Clustering for Accelerating Parallel Coordinate Descent
Scherrer, Chad; Tewari, Ambuj; Halappanavar, Mahantesh; Haglin, David J.
2012-12-06
We demonstrate an approach for accelerating calculation of the regularization path for L1 sparse logistic regression problems. We show the benefit of feature clustering as a preconditioning step for parallel block-greedy coordinate descent algorithms.
The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering
NASA Astrophysics Data System (ADS)
Schaefer, Andreas; Daniell, James; Wenzel, Friedemann
2016-04-01
Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in
A new Growing Neural Gas for clustering data streams.
Ghesmoune, Mohammed; Lebbah, Mustapha; Azzag, Hanene
2016-06-01
Clustering data streams is becoming the most efficient way to cluster a massive dataset. This task requires a process capable of partitioning observations continuously with restrictions of memory and time. In this paper we present a new algorithm, called G-Stream, for clustering data streams by making one pass over the data. G-Stream is based on growing neural gas, that allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. By using a reservoir, and applying a fading function, the quality of clustering is improved. The performance of the proposed algorithm is evaluated on public datasets. PMID:26997530
Coral: an integrated suite of visualizations for comparing clusterings
2012-01-01
Background Clustering has become a standard analysis for many types of biological data (e.g interaction networks, gene expression, metagenomic abundance). In practice, it is possible to obtain a large number of contradictory clusterings by varying which clustering algorithm is used, which data attributes are considered, how algorithmic parameters are set, and which near-optimal clusterings are chosen. It is a difficult task to sift though such a large collection of varied clusterings to determine which clustering features are affected by parameter settings or are artifacts of particular algorithms and which represent meaningful patterns. Knowing which items are often clustered together helps to improve our understanding of the underlying data and to increase our confidence about generated modules. Results We present Coral, an application for interactive exploration of large ensembles of clusterings. Coral makes all-to-all clustering comparison easy, supports exploration of individual clusterings, allows tracking modules across clusterings, and supports identification of core and peripheral items in modules. We discuss how each visual component in Coral tackles a specific question related to clustering comparison and provide examples of their use. We also show how Coral could be used to visually and quantitatively compare clusterings with a ground truth clustering. Conclusion As a case study, we compare clusterings of a recently published protein interaction network of Arabidopsis thaliana. We use several popular algorithms to generate the network’s clusterings. We find that the clusterings vary significantly and that few proteins are consistently co-clustered in all clusterings. This is evidence that several clusterings should typically be considered when evaluating modules of genes, proteins, or sequences, and Coral can be used to perform a comprehensive analysis of these clustering ensembles. PMID:23102108
Gene expression data clustering using a multiobjective symmetry based clustering technique.
Saha, Sriparna; Ekbal, Asif; Gupta, Kshitija; Bandyopadhyay, Sanghamitra
2013-11-01
The invention of microarrays has rapidly changed the state of biological and biomedical research. Clustering algorithms play an important role in clustering microarray data sets where identifying groups of co-expressed genes are a very difficult task. Here we have posed the problem of clustering the microarray data as a multiobjective clustering problem. A new symmetry based fuzzy clustering technique is developed to solve this problem. The effectiveness of the proposed technique is demonstrated on five publicly available benchmark data sets. Results are compared with some widely used microarray clustering techniques. Statistical and biological significance tests have also been carried out. PMID:24209942
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing.
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, "MOPSOSA". The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets.
SMART: Unique Splitting-While-Merging Framework for Gene Clustering
Fa, Rui; Roberts, David J.; Nandi, Asoke K.
2014-01-01
Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms. PMID:24714159
Hu, Xiaohua; Park, E K; Zhang, Xiaodan
2009-09-01
Generating high-quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. To get high-quality cluster results, most of the current approaches rely on choosing the best cluster algorithm, in which the design biases and assumptions meet the underlying distribution of the dataset. There are two issues for this approach: 1) usually, the underlying data distribution of the gene expression datasets is unknown and 2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand-alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. In this paper, we design and develop a unified system Gene Expression Miner to address these challenging issues in a principled and general manner by integrating cluster ensemble, text clustering, and multidocument summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high-quality gene cluster. In our text summarization module, given a gene cluster, our expectation-maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high-quality clusters and provide informative key terms for the gene clusters.
NASA Technical Reports Server (NTRS)
1999-01-01
Penetrating 25,000 light-years of obscuring dust and myriad stars, NASA's Hubble Space Telescope has provided the clearest view yet of one of the largest young clusters of stars inside our Milky Way galaxy, located less than 100 light-years from the very center of the Galaxy. Having the equivalent mass greater than 10,000 stars like our sun, the monster cluster is ten times larger than typical young star clusters scattered throughout our Milky Way. It is destined to be ripped apart in just a few million years by gravitational tidal forces in the galaxy's core. But in its brief lifetime it shines more brightly than any other star cluster in the Galaxy. Quintuplet Cluster is 4 million years old. It has stars on the verge of blowing up as supernovae. It is the home of the brightest star seen in the galaxy, called the Pistol star. This image was taken in infrared light by Hubble's NICMOS camera in September 1997. The false colors correspond to infrared wavelengths. The galactic center stars are white, the red stars are enshrouded in dust or behind dust, and the blue stars are foreground stars between us and the Milky Way's center. The cluster is hidden from direct view behind black dust clouds in the constellation Sagittarius. If the cluster could be seen from earth it would appear to the naked eye as a 3rd magnitude star, 1/6th of a full moon's diameter apart.
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients
NASA Astrophysics Data System (ADS)
Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane
2015-06-01
Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.
Tsai, Jang-Zern; Chen, Yu-Wei; Wang, Kuo-Wei; Wu, Hsiao-Kuang; Lin, Yun-Yu; Lee, Ying-Ying; Chen, Chi-Jen; Lin, Huey-Juan; Smith, Eric Edward; Hsin, Yue-Loong
2014-01-01
Determination of the volumes of acute cerebral infarct in the magnetic resonance imaging harbors prognostic values. However, semiautomatic method of segmentation is time-consuming and with high interrater variability. Using diffusion weighted imaging and apparent diffusion coefficient map from patients with acute infarction in 10 days, we aimed to develop a fully automatic algorithm to measure infarct volume. It includes an unsupervised classification with fuzzy C-means clustering determination of the histographic distribution, defining self-adjusted intensity thresholds. The proposed method attained high agreement with the semiautomatic method, with similarity index 89.9 ± 6.5%, in detecting cerebral infarct lesions from 22 acute stroke patients. We demonstrated the accuracy of the proposed computer-assisted prompt segmentation method, which appeared promising to replace the laborious, time-consuming, and operator-dependent semiautomatic segmentation. PMID:24738080
NASA Astrophysics Data System (ADS)
Miller, Christopher J. Miller
2012-03-01
There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8◦ on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations
ERIC Educational Resources Information Center
Pottawattamie County School System, Council Bluffs, IA.
The 15 occupational clusters (transportation, fine arts and humanities, communications and media, personal service occupations, construction, hospitality and recreation, health occupations, marine science occupations, consumer and homemaking-related occupations, agribusiness and natural resources, environment, public service, business and office…
Donchev, Todor I.; Petrov, Ivan G.
2011-05-31
Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.
Clustering Binary Data in the Presence of Masking Variables
ERIC Educational Resources Information Center
Brusco, Michael J.
2004-01-01
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the…
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Processing of rock core microtomography images: Using seven different machine learning algorithms
NASA Astrophysics Data System (ADS)
Chauhan, Swarup; Rühaak, Wolfram; Khan, Faisal; Enzmann, Frieder; Mielke, Philipp; Kersten, Michael; Sass, Ingo
2016-01-01
The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.
Misty Mountain clustering: application to fast unsupervised flow cytometry gating
2010-01-01
Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions Misty Mountain is fast, unbiased
The hierarchical algorithms--theory and applications
NASA Astrophysics Data System (ADS)
Su, Zheng-Yao
Monte Carlo simulations are one of the most important numerical techniques for investigating statistical physical systems. Among these systems, spin models are a typical example which also play an essential role in constructing the abstract mechanism for various complex systems. Unfortunately, traditional Monte Carlo algorithms are afflicted with "critical slowing down" near continuous phase transitions and the efficiency of the Monte Carlo simulation goes to zero as the size of the lattice is increased. To combat critical slowing down, a very different type of collective-mode algorithm, in contrast to the traditional single-spin-flipmode, was proposed by Swendsen and Wang in 1987 for Potts spin models. Since then, there has been an explosion of work attempting to understand, improve, or generalize it. In these so-called "cluster" algorithms, clusters of spin are regarded as one template and are updated at each step of the Monte Carlo procedure. In implementing these algorithms the cluster labeling is a major time-consuming bottleneck and is also isomorphic to the problem of computing connected components of an undirected graph seen in other application areas, such as pattern recognition.A number of cluster labeling algorithms for sequential computers have long existed. However, the dynamic irregular nature of clusters complicates the task of finding good parallel algorithms and this is particularly true on SIMD (single-instruction-multiple-data machines. Our design of the Hierarchical Cluster Labeling Algorithm aims at alleviating this problem by building a hierarchical structure on the problem domain and by incorporating local and nonlocal communication schemes. We present an estimate for the computational complexity of cluster labeling and prove the key features of this algorithm (such as lower computational complexity, data locality, and easy implementation) compared with the methods formerly known. In particular, this algorithm can be viewed as a generalized
DEDICATED FILTER FOR DEFECTS CLUSTERING IN RADIOGRAPHIC IMAGE
Sikora, R.; Swiadek, K.; Chady, T.
2009-03-03
Defect clusters such as linear or clustered porosity are in some cases even more important than single flaws. This paper presents two methods of defect clustering and algorithm for calculation of distances between flaws in digital radiographic image. Dedicated lookup table based filter is used for calculation of distances between objects in the specified range. For defect clustering two functions were developed. First one is based on MMD (Minimum Mean Distance) algorithm. Second one uses hierarchical procedures for clustering defects of various types, shapes and size.
Bipartite graph partitioning and data clustering
Zha, Hongyuan; He, Xiaofeng; Ding, Chris; Gu, Ming; Simon, Horst D.
2001-05-07
Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, the authors propose a new data clustering method based on partitioning the underlying biopartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. They show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. They point out the connection of their clustering algorithm to correspondence analysis used in multivariate analysis. They also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, they apply their clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.
Estimating the number of clusters via system evolution for cluster analysis of gene expression data.
Wang, Kaijun; Zheng, Jie; Zhang, Junying; Dong, Jiyang
2009-09-01
The estimation of the number of clusters (NC) is one of crucial problems in the cluster analysis of gene expression data. Most approaches available give their answers without the intuitive information about separable degrees between clusters. However, this information is useful for understanding cluster structures. To provide this information, we propose system evolution (SE) method to estimate NC based on partitioning around medoids (PAM) clustering algorithm. SE analyzes cluster structures of a dataset from the viewpoint of a pseudothermodynamics system. The system will go to its stable equilibrium state, at which the optimal NC is found, via its partitioning process and merging process. The experimental results on simulated and real gene expression data demonstrate that the SE works well on the data with well-separated clusters and the one with slightly overlapping clusters. PMID:19527960
A New Elliptical Grid Clustering Method
NASA Astrophysics Data System (ADS)
Guansheng, Zheng
A new base on grid clustering method is presented in this paper. This new method first does unsupervised learning on the high dimensions data. This paper proposed a grid-based approach to clustering. It maps the data onto a multi-dimensional space and applies a linear transformation to the feature space instead of to the objects themselves and then approach a grid-clustering method. Unlike the conventional methods, it uses a multidimensional hyper-eclipse grid cell. Some case studies and ideas how to use the algorithms are described. The experimental results show that EGC can discover abnormity shapes of clusters.
Clustering of High Throughput Gene Expression Data
Pirim, Harun; Ekşioğlu, Burak; Perkins, Andy; Yüceer, Çetin
2012-01-01
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community. PMID:23144527
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
Wu, Jianfa; Peng, Dahao; Li, Zhuping; Zhao, Li; Ling, Huanzhang
2015-01-01
To effectively and accurately detect and classify network intrusion data, this paper introduces a general regression neural network (GRNN) based on the artificial immune algorithm with elitist strategies (AIAE). The elitist archive and elitist crossover were combined with the artificial immune algorithm (AIA) to produce the AIAE-GRNN algorithm, with the aim of improving its adaptivity and accuracy. In this paper, the mean square errors (MSEs) were considered the affinity function. The AIAE was used to optimize the smooth factors of the GRNN; then, the optimal smooth factor was solved and substituted into the trained GRNN. Thus, the intrusive data were classified. The paper selected a GRNN that was separately optimized using a genetic algorithm (GA), particle swarm optimization (PSO), and fuzzy C-mean clustering (FCM) to enable a comparison of these approaches. As shown in the results, the AIAE-GRNN achieves a higher classification accuracy than PSO-GRNN, but the running time of AIAE-GRNN is long, which was proved first. FCM and GA-GRNN were eliminated because of their deficiencies in terms of accuracy and convergence. To improve the running speed, the paper adopted principal component analysis (PCA) to reduce the dimensions of the intrusive data. With the reduction in dimensionality, the PCA-AIAE-GRNN decreases in accuracy less and has better convergence than the PCA-PSO-GRNN, and the running speed of the PCA-AIAE-GRNN was relatively improved. The experimental results show that the AIAE-GRNN has a higher robustness and accuracy than the other algorithms considered and can thus be used to classify the intrusive data. PMID:25807466
MIP Reconstruction Techniques and Minimum Spanning Tree Clustering
Mader, Wolfgang F.; /Iowa U.
2005-09-12
The development of a tracking algorithm for minimum ionizing particles in the calorimeter and of a clustering algorithm based on the Minimum Spanning Tree approach are described. They do not depend on information from the central tracking system. Both are important components of a particle flow algorithm currently under development.
Matlab Cluster Ensemble Toolbox v. 1.0
2009-04-27
This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.
Multi-view spectral clustering and its chemical application.
Adefioye, Adeshola A; Liu, Xinhai; De Moor, Bart
2013-01-01
Clustering is an unsupervised method that allows researchers to group objects and gather information about their relationships. In chemoinformatics, clustering enables hypotheses to be drawn about a compound's biological, chemical and physical property in comparison to another. We introduce a novel improved spectral clustering algorithm, proposed for chemical compound clustering, using multiple data sources. Tensor-based spectral methods, used in this paper, provide chemically appropriate and statistically significant results when attempting to cluster compounds from both the GSK-Chembl Malaria data set and the Zinc database. Spectral clustering algorithms based on the tensor method give robust results on the mid-size compound sets used here. The goal of this paper is to present the clustering of chemical compounds, using a tensor-based multi-view method which proves of value to the medicinal chemistry community. Our findings show compounds of extremely different chemotypes clustering together, this is a hint to the chemogenomics nature of our method.
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Cui, Xiaohui; Mueller, Frank; Zhang, Yongpeng; Potok, Thomas E
2010-01-01
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
Complementary ensemble clustering of biomedical data.
Fodeh, Samah Jamal; Brandt, Cynthia; Luong, Thai Binh; Haddad, Ali; Schultz, Martin; Murphy, Terrence; Krauthammer, Michael
2013-06-01
The rapidly growing availability of electronic biomedical data has increased the need for innovative data mining methods. Clustering in particular has been an active area of research in many different application areas, with existing clustering algorithms mostly focusing on one modality or representation of the data. Complementary ensemble clustering (CEC) is a recently introduced framework in which Kmeans is applied to a weighted, linear combination of the coassociation matrices obtained from separate ensemble clustering of different data modalities. The strength of CEC is its extraction of information from multiple aspects of the data when forming the final clusters. This study assesses the utility of CEC in biomedical data, which often have multiple data modalities, e.g., text and images, by applying CEC to two distinct biomedical datasets (PubMed images and radiology reports) that each have two modalities. Referent to five different clustering approaches based on the Kmeans algorithm, CEC exhibited equal or better performance in the metrics of micro-averaged precision and Normalized Mutual Information across both datasets. The reference methods included clustering of single modalities as well as ensemble clustering of separate and merged data modalities. Our experimental results suggest that CEC is equivalent or more efficient than comparable Kmeans based clustering methods using either single or merged data modalities.
Multiple Manifold Clustering Using Curvature Constrained Path
Babaeian, Amir; Bayestehtashk, Alireza; Bandarabadi, Mojtaba
2015-01-01
The problem of multiple surface clustering is a challenging task, particularly when the surfaces intersect. Available methods such as Isomap fail to capture the true shape of the surface near by the intersection and result in incorrect clustering. The Isomap algorithm uses shortest path between points. The main draw back of the shortest path algorithm is due to the lack of curvature constrained where causes to have a path between points on different surfaces. In this paper we tackle this problem by imposing a curvature constraint to the shortest path algorithm used in Isomap. The algorithm chooses several landmark nodes at random and then checks whether there is a curvature constrained path between each landmark node and every other node in the neighborhood graph. We build a binary feature vector for each point where each entry represents the connectivity of that point to a particular landmark. Then the binary feature vectors could be used as a input of conventional clustering algorithm such as hierarchical clustering. We apply our method to simulated and some real datasets and show, it performs comparably to the best methods such as K-manifold and spectral multi-manifold clustering. PMID:26375819
Robust Face Clustering Via Tensor Decomposition.
Cao, Xiaochun; Wei, Xingxing; Han, Yahong; Lin, Dongdai
2015-11-01
Face clustering is a key component either in image managements or video analysis. Wild human faces vary with the poses, expressions, and illumination changes. All kinds of noises, like block occlusions, random pixel corruptions, and various disguises may also destroy the consistency of faces referring to the same person. This motivates us to develop a robust face clustering algorithm that is less sensitive to these noises. To retain the underlying structured information within facial images, we use tensors to represent faces, and then accomplish the clustering task based on the tensor data. The proposed algorithm is called robust tensor clustering (RTC), which firstly finds a lower-rank approximation of the original tensor data using a L1 norm optimization function. Because L1 norm does not exaggerate the effect of noises compared with L2 norm, the minimization of the L1 norm approximation function makes RTC robust. Then, we compute high-order singular value decomposition of this approximate tensor to obtain the final clustering results. Different from traditional algorithms solving the approximation function with a greedy strategy, we utilize a nongreedy strategy to obtain a better solution. Experiments conducted on the benchmark facial datasets and gait sequences demonstrate that RTC has better performance than the state-of-the-art clustering algorithms and is more robust to noises. PMID:25546869
Retro: concept-based clustering of biomedical topical sets
Yeganova, Lana; Kim, Won; Kim, Sun; Wilbur, W. John
2014-01-01
Motivation: Clustering methods can be useful for automatically grouping documents into meaningful clusters, improving human comprehension of a document collection. Although there are clustering algorithms that can achieve the goal for relatively large document collections, they do not always work well for small and homogenous datasets. Methods: In this article, we present Retro—a novel clustering algorithm that extracts meaningful clusters along with concise and descriptive titles from small and homogenous document collections. Unlike common clustering approaches, our algorithm predicts cluster titles before clustering. It relies on the hypergeometric distribution model to discover key phrases, and generates candidate clusters by assigning documents to these phrases. Further, the statistical significance of candidate clusters is tested using supervised learning methods, and a multiple testing correction technique is used to control the overall quality of clustering. Results: We test our system on five disease datasets from OMIM® and evaluate the results based on MeSH® term assignments. We further compare our method with several baseline and state-of-the-art methods, including K-means, expectation maximization, latent Dirichlet allocation-based clustering, Lingo, OPTIMSRC and adapted GK-means. The experimental results on the 20-Newsgroup and ODP-239 collections demonstrate that our method is successful at extracting significant clusters and is superior to existing methods in terms of quality of clusters. Finally, we apply our system to a collection of 6248 topical sets from the HomoloGene® database, a resource in PubMed®. Empirical evaluation confirms the method is useful for small homogenous datasets in producing meaningful clusters with descriptive titles. Availability and implementation: A web-based demonstration of the algorithm applied to a collection of sets from the HomoloGene database is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/CLUSTERING
Reactive Collision Avoidance Algorithm
NASA Technical Reports Server (NTRS)
Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred
2010-01-01
The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on
Automated variable weighting in k-means type clustering.
Huang, Joshua Zhexue; Ng, Michael K; Rong, Hongqiang; Li, Zichen
2005-05-01
This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed. The convergency theorem of the new clustering process is given. The variable weights produced by the algorithm measure the importance of variables in clustering and can be used in variable selection in data mining applications where large and complex real data are often involved. Experimental results on both synthetic and real data have shown that the new algorithm outperformed the standard k-means type algorithms in recovering clusters in data.
Clustering of Variables for Mixed Data
NASA Astrophysics Data System (ADS)
Saracco, J.; Chavent, M.
2016-05-01
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
NASA Astrophysics Data System (ADS)
Friedenberg, David
2010-10-01
the rate of falsely detected active regions. Additionally we examine the more general field of clustering and develop a framework for clustering algorithms based around diffusion maps. Diffusion maps can be used to project high-dimensional data into a lower dimensional space while preserving much of the structure in the data. We demonstrate how diffusion maps can be used to solve clustering problems and examine the influence of tuning parameters on the results. We introduce two novel methods, the self-tuning diffusion map which replaces the global scaling parameter in the typical diffusion map framework with a local scaling parameter and an algorithm for automatically selecting tuning parameters based on a cross-validation style score called prediction strength. The methods are tested on several example datasets.
A GMBCG GALAXY CLUSTER CATALOG OF 55,424 RICH CLUSTERS FROM SDSS DR7
Hao Jiangang; Annis, James; Johnston, David E.; McKay, Timothy A.; Evrard, August; Siegel, Seth R.; Gerdes, David; Koester, Benjamin P.; Rykoff, Eli S.; Rozo, Eduardo; Wechsler, Risa H.; Busha, Michael; Becker, Matthew; Sheldon, Erin
2010-12-15
We present a large catalog of optically selected galaxy clusters from the application of a new Gaussian Mixture Brightest Cluster Galaxy (GMBCG) algorithm to SDSS Data Release 7 data. The algorithm detects clusters by identifying the red-sequence plus brightest cluster galaxy (BCG) feature, which is unique for galaxy clusters and does not exist among field galaxies. Red-sequence clustering in color space is detected using an Error Corrected Gaussian Mixture Model. We run GMBCG on 8240 deg{sup 2} of photometric data from SDSS DR7 to assemble the largest ever optical galaxy cluster catalog, consisting of over 55,000 rich clusters across the redshift range from 0.1 < z < 0.55. We present Monte Carlo tests of completeness and purity and perform cross-matching with X-ray clusters and with the maxBCG sample at low redshift. These tests indicate high completeness and purity across the full redshift range for clusters with 15 or more members.
A Short Survey of Document Structure Similarity Algorithms
Buttler, D
2004-02-27
This paper provides a brief survey of document structural similarity algorithms, including the optimal Tree Edit Distance algorithm and various approximation algorithms. The approximation algorithms include the simple weighted tag similarity algorithm, Fourier transforms of the structure, and a new application of the shingle technique to structural similarity. We show three surprising results. First, the Fourier transform technique proves to be the least accurate of any of approximation algorithms, while also being slowest. Second, optimal Tree Edit Distance algorithms may not be the best technique for clustering pages from different sites. Third, the simplest approximation to structure may be the most effective and efficient mechanism for many applications.
Collaborative Clustering for Sensor Networks
NASA Technical Reports Server (NTRS)
Wagstaff. Loro :/; Green Jillian; Lane, Terran
2011-01-01
Traditionally, nodes in a sensor network simply collect data and then pass it on to a centralized node that archives, distributes, and possibly analyzes the data. However, analysis at the individual nodes could enable faster detection of anomalies or other interesting events, as well as faster responses such as sending out alerts or increasing the data collection rate. There is an additional opportunity for increased performance if individual nodes can communicate directly with their neighbors. Previously, a method was developed by which machine learning classification algorithms could collaborate to achieve high performance autonomously (without requiring human intervention). This method worked for supervised learning algorithms, in which labeled data is used to train models. The learners collaborated by exchanging labels describing the data. The new advance enables clustering algorithms, which do not use labeled data, to also collaborate. This is achieved by defining a new language for collaboration that uses pair-wise constraints to encode useful information for other learners. These constraints specify that two items must, or cannot, be placed into the same cluster. Previous work has shown that clustering with these constraints (in isolation) already improves performance. In the problem formulation, each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. Each learner clusters its data and then selects a pair of items about which it is uncertain and uses them to query its neighbors. The resulting feedback (a must and cannot constraint from each neighbor) is combined by the learner into a consensus constraint, and it then reclusters its data while incorporating the new constraint. A strategy was also proposed for cleaning the resulting constraint sets, which may contain conflicting constraints; this improves performance significantly. This approach has been applied to collaborative
Improving clustering with metabolic pathway data
2014-01-01
Background It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis
2010-01-01
Introduction The revised International Headache Society (IHS) criteria for cluster headache are: attacks of severe or very severe, strictly unilateral pain, which is orbital, supraorbital, or temporal pain, lasting 15 to 180 minutes and occurring from once every other day to eight times daily. Methods and outcomes We conducted a systematic review and aimed to answer the following clinical questions: What are the effects of interventions to abort cluster headache? What are the effects of interventions to prevent cluster headache? We searched: Medline, Embase, The Cochrane Library, and other important databases up to June 2009 (Clinical Evidence reviews are updated periodically; please check our website for the most up-to-date version of this review). We included harms alerts from relevant organisations, such as the US Food and Drug Administration (FDA) and the UK Medicines and Healthcare products Regulatory Agency (MHRA). Results We found 23 systematic reviews, RCTs, or observational studies that met our inclusion criteria. We performed a GRADE evaluation of the quality of evidence for interventions. Conclusions In this systematic review, we present information relating to the effectiveness and safety of the following interventions: baclofen (oral); botulinum toxin (intramuscular); capsaicin (intranasal); chlorpromazine; civamide (intranasal); clonidine (transdermal); corticosteroids; ergotamine and dihydroergotamine (oral or intranasal); gabapentin (oral); greater occipital nerve injections (betamethasone plus xylocaine); high-dose and high-flow-rate oxygen; hyperbaric oxygen; leuprolide; lidocaine (intranasal); lithium (oral); melatonin; methysergide (oral); octreotide (subcutaneous); pizotifen (oral); sodium valproate (oral); sumatriptan (oral, subcutaneous, and intranasal); topiramate (oral); tricyclic antidepressants (TCAs); verapamil; and zolmitriptan (oral and intranasal). PMID:21718584
Java implementation of Class Association Rule algorithms
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less
Java implementation of Class Association Rule algorithms
Tamura, Makio
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.
Biological cluster evaluation for gene function prediction.
Klie, Sebastian; Nikoloski, Zoran; Selbig, Joachim
2014-06-01
Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set.
A fast meteor detection algorithm
NASA Astrophysics Data System (ADS)
Gural, P.
2016-01-01
A low latency meteor detection algorithm for use with fast steering mirrors had been previously developed to track and telescopically follow meteors in real-time (Gural, 2007). It has been rewritten as a generic clustering and tracking software module for meteor detection that meets both the demanding throughput requirements of a Raspberry Pi while also maintaining a high probability of detection. The software interface is generalized to work with various forms of front-end video pre-processing approaches and provides a rich product set of parameterized line detection metrics. Discussion will include the Maximum Temporal Pixel (MTP) compression technique as a fast thresholding option for feeding the detection module, the detection algorithm trade for maximum processing throughput, details on the clustering and tracking methodology, processing products, performance metrics, and a general interface description.
NASA Technical Reports Server (NTRS)
Barth, Timothy J.; Lomax, Harvard
1987-01-01
The past decade has seen considerable activity in algorithm development for the Navier-Stokes equations. This has resulted in a wide variety of useful new techniques. Some examples for the numerical solution of the Navier-Stokes equations are presented, divided into two parts. One is devoted to the incompressible Navier-Stokes equations, and the other to the compressible form.
NASA Astrophysics Data System (ADS)
Kim, H.; Ho, C.; Kim, J.
2008-12-01
This study presents the pattern classification of tropical cyclone (TC) tracks over the western North Pacific (WNP) basin during the typhoon season (June through October) for 1965-2006 (total 42 years) using a fuzzy clustering method. After the fuzzy c-mean clustering algorithm to the TC trajectory interpolated into 20 segments of equivalent length, we divided the whole tracks into 7 patterns. The optimal number of the fuzzy cluster is determined by several validity measures. The classified TC track patterns represent quite different features in the recurving latitudes, genesis locations, and geographical pathways: TCs mainly forming in east-northern part of the WNP and striking Korean and Japan (C1); mainly forming in west-southern part of the WNP, traveling long pathway, and partly striking Japan (C2); mainly striking Taiwan and East China (C3); traveling near the east coast of Japan (C4); traveling the distant ocean east of Japan (C5); moving toward South China and Vietnam straightly (C6); and forming in the South China Sea (C7). Atmospheric environments related to each cluster show physically consistent with each TC track patterns. The straight track pattern is closely linked to a developed anticyclonic circulation to the north of the TC. It implies that this ridge acts as a steering flow forcing TCs to move to the northwest with a more west-oriented track. By contrast, recurving patterns occur commonly under the influence of the strong anomalous westerlies over the TC pathway but there definitely exist characteristic anomalous circulations over the mid- latitudes by pattern. Some clusters are closely related to the well-known large-scale phenomena. The C1 and C2 are highly related to the ENSO phase: The TCs in the C1 (C2) is more active during La Niña (El Niño). The TC activity in the C3 is associated with the WNP summer monsoon. The TCs in the C4 is more (less) vigorous during the easterly (westerly) phase of the stratospheric quasi-biennial oscillation
Fuzzy and hard clustering analysis for thyroid disease.
Azar, Ahmad Taher; El-Said, Shaimaa Ahmed; Hassanien, Aboul Ella
2013-07-01
Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c=3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. PMID:23357404
NASA Astrophysics Data System (ADS)
Evertz, Hans Gerd
1998-03-01
Exciting new investigations have recently become possible for strongly correlated systems of spins, bosons, and fermions, through Quantum Monte Carlo simulations with the Loop Algorithm (H.G. Evertz, G. Lana, and M. Marcu, Phys. Rev. Lett. 70, 875 (1993).) (For a recent review see: H.G. Evertz, cond- mat/9707221.) and its generalizations. A review of this new method, its generalizations and its applications is given, including some new results. The Loop Algorithm is based on a formulation of physical models in an extended ensemble of worldlines and graphs, and is related to Swendsen-Wang cluster algorithms. It performs nonlocal changes of worldline configurations, determined by local stochastic decisions. It overcomes many of the difficulties of traditional worldline simulations. Computer time requirements are reduced by orders of magnitude, through a corresponding reduction in autocorrelations. The grand-canonical ensemble (e.g. varying winding numbers) is naturally simulated. The continuous time limit can be taken directly. Improved Estimators exist which further reduce the errors of measured quantities. The algorithm applies unchanged in any dimension and for varying bond-strengths. It becomes less efficient in the presence of strong site disorder or strong magnetic fields. It applies directly to locally XYZ-like spin, fermion, and hard-core boson models. It has been extended to the Hubbard and the tJ model and generalized to higher spin representations. There have already been several large scale applications, especially for Heisenberg-like models, including a high statistics continuous time calculation of quantum critical exponents on a regularly depleted two-dimensional lattice of up to 20000 spatial sites at temperatures down to T=0.01 J.
Weighted voting-based consensus clustering for chemical structure databases.
Saeed, Faisal; Ahmed, Ali; Shamsir, Mohd Shahir; Salim, Naomie
2014-06-01
The cluster-based compound selection is used in the lead identification process of drug discovery and design. Many clustering methods have been used for chemical databases, but there is no clustering method that can obtain the best results under all circumstances. However, little attention has been focused on the use of combination methods for chemical structure clustering, which is known as consensus clustering. Recently, consensus clustering has been used in many areas including bioinformatics, machine learning and information theory. This process can improve the robustness, stability, consistency and novelty of clustering. For chemical databases, different consensus clustering methods have been used including the co-association matrix-based, graph-based, hypergraph-based and voting-based methods. In this paper, a weighted cumulative voting-based aggregation algorithm (W-CVAA) was developed. The MDL Drug Data Report (MDDR) benchmark chemical dataset was used in the experiments and represented by the AlogP and ECPF_4 descriptors. The results from the clustering methods were evaluated by the ability of the clustering to separate biologically active molecules in each cluster from inactive ones using different criteria, and the effectiveness of the consensus clustering was compared to that of Ward's method, which is the current standard clustering method in chemoinformatics. This study indicated that weighted voting-based consensus clustering can overcome the limitations of the existing voting-based methods and improve the effectiveness of combining multiple clusterings of chemical structures. PMID:24830925
Winlaw, Manda; De Sterck, Hans; Sanders, Geoffrey
2015-10-26
In very simple terms a network can be de ned as a collection of points joined together by lines. Thus, networks can be used to represent connections between entities in a wide variety of elds including engi- neering, science, medicine, and sociology. Many large real-world networks share a surprising number of properties, leading to a strong interest in model development research and techniques for building synthetic networks have been developed, that capture these similarities and replicate real-world graphs. Modeling these real-world networks serves two purposes. First, building models that mimic the patterns and prop- erties of real networks helps to understand the implications of these patterns and helps determine which patterns are important. If we develop a generative process to synthesize real networks we can also examine which growth processes are plausible and which are not. Secondly, high-quality, large-scale network data is often not available, because of economic, legal, technological, or other obstacles [7]. Thus, there are many instances where the systems of interest cannot be represented by a single exemplar network. As one example, consider the eld of cybersecurity, where systems require testing across diverse threat scenarios and validation across diverse network structures. In these cases, where there is no single exemplar network, the systems must instead be modeled as a collection of networks in which the variation among them may be just as important as their common features. By developing processes to build synthetic models, so-called graph generators, we can build synthetic networks that capture both the essential features of a system and realistic variability. Then we can use such synthetic graphs to perform tasks such as simulations, analysis, and decision making. We can also use synthetic graphs to performance test graph analysis algorithms, including clustering algorithms and anomaly detection algorithms.
Segmentation of MRI Brain Images with an Improved Harmony Searching Algorithm
Yang, Zhang; Li, Guo; Weifeng, Ding
2016-01-01
The harmony searching (HS) algorithm is a kind of optimization search algorithm currently applied in many practical problems. The HS algorithm constantly revises variables in the harmony database and the probability of different values that can be used to complete iteration convergence to achieve the optimal effect. Accordingly, this study proposed a modified algorithm to improve the efficiency of the algorithm. First, a rough set algorithm was employed to improve the convergence and accuracy of the HS algorithm. Then, the optimal value was obtained using the improved HS algorithm. The optimal value of convergence was employed as the initial value of the fuzzy clustering algorithm for segmenting magnetic resonance imaging (MRI) brain images. Experimental results showed that the improved HS algorithm attained better convergence and more accurate results than those of the original HS algorithm. In our study, the MRI image segmentation effect of the improved algorithm was superior to that of the original fuzzy clustering method. PMID:27403428
Segmentation of MRI Brain Images with an Improved Harmony Searching Algorithm.
Yang, Zhang; Shufan, Ye; Li, Guo; Weifeng, Ding
2016-01-01
The harmony searching (HS) algorithm is a kind of optimization search algorithm currently applied in many practical problems. The HS algorithm constantly revises variables in the harmony database and the probability of different values that can be used to complete iteration convergence to achieve the optimal effect. Accordingly, this study proposed a modified algorithm to improve the efficiency of the algorithm. First, a rough set algorithm was employed to improve the convergence and accuracy of the HS algorithm. Then, the optimal value was obtained using the improved HS algorithm. The optimal value of convergence was employed as the initial value of the fuzzy clustering algorithm for segmenting magnetic resonance imaging (MRI) brain images. Experimental results showed that the improved HS algorithm attained better convergence and more accurate results than those of the original HS algorithm. In our study, the MRI image segmentation effect of the improved algorithm was superior to that of the original fuzzy clustering method. PMID:27403428
Some Basic Elements in Clustering and Classification
NASA Astrophysics Data System (ADS)
Grégoire, G.
2016-05-01
This chapter deals with basic tools useful in clustering and classification and present some commonly used approaches for these two problems. Since several chapters in these proceedings are devoted to approaches to deal with classification, we give more attention in this chapter to clustering issues. We are first concerned with notions of distances or dissimilarities between objects we are to group in clusters. Then based on these inter-objects distances we define distances between sets of objects, such as single linkage, complete linkage or Ward distance. Three clustering algorithms are presented with some details and compared: Kmeans, Ascendant Hierarchical and DBSCAN algorithms. The comparison between partitions and the issue of choosing the correct number of clusters are investigated and the proposed procedures are tested on two data sets. We emphasize the fact that the results provided by the numerous indices available in the literature for selecting the number of clusters is largely depending upon the shape and the dispersion we are assuming for these clusters. Finally the last section is devoted to classification. Some basic notions such as training sets, test sets and cross-validation are discussed. Two particular approaches are detailed, the K-nearest neighbors method and the logistic regression, and comparisons with LDA (Linear Discriminant Analysis) and QDA (Quadratic Discriminant Analysis) are analyzed.
Classical and quantum physics of hydrogen clusters.
Mezzacapo, Fabio; Boninsegni, Massimo
2009-04-22
We present results of a comprehensive theoretical investigation of the low temperature (T) properties of clusters of para-hydrogen (p-H(2)), both pristine as well as doped with isotopic impurities (i.e., ortho-deuterium, o-D(2)). We study clusters comprising up to N = 40 molecules, by means of quantum simulations based on the continuous-space Worm algorithm. Pristine p-H(2) clusters are liquid-like and superfluid in the [Formula: see text] limit. The superfluid signal is uniform throughout these clusters; it is underlain by long cycles of permutation of molecules. Clusters with more than 22 molecules display solid-like, essentially classical behavior at temperatures down to T∼1 K; some of them are seen to turn liquid-like at sufficiently low T (quantum melting).
Impact of heuristics in clustering large biological networks.
Shafin, Md Kishwar; Kabir, Kazi Lutful; Ridwan, Iffatur; Anannya, Tasmiah Tamzid; Karim, Rashid Saadman; Hoque, Mohammad Mozammel; Rahman, M Sohel
2015-12-01
Traditional clustering algorithms often exhibit poor performance for large networks. On the contrary, greedy algorithms are found to be relatively efficient while uncovering functional modules from large biological networks. The quality of the clusters produced by these greedy techniques largely depends on the underlying heuristics employed. Different heuristics based on different attributes and properties perform differently in terms of the quality of the clusters produced. This motivates us to design new heuristics for clustering large networks. In this paper, we have proposed two new heuristics and analyzed the performance thereof after incorporating those with three different combinations in a recently celebrated greedy clustering algorithm named SPICi. We have extensively analyzed the effectiveness of these new variants. The results are found to be promising. PMID:26386663
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning
NASA Astrophysics Data System (ADS)
Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff
2016-01-01
Galaxy clusters are a rich source of information for examining fundamental astrophysical processes and cosmological parameters, however, employing clusters as cosmological probes requires accurate mass measurements derived from cluster observables. We study dynamical mass measurements of galaxy clusters contaminated by interlopers, and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create a mock catalog from Multidark's publicly-available N-body MDPL1 simulation where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power law scaling relation to infer cluster mass from galaxy line of sight (LOS) velocity dispersion. The presence of interlopers in the catalog produces a wide, flat fractional mass error distribution, with width = 2.13. We employ the Support Distribution Machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (width = 0.67). Remarkably, SDM applied to contaminated clusters is better able to recover masses than even a scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
Firefly Algorithm for Structural Search.
Avendaño-Franco, Guillermo; Romero, Aldo H
2016-07-12
The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers. PMID:27232694
NASA Astrophysics Data System (ADS)
Elbakary, M. I.; Alam, M. S.; Aslan, M. S.
2007-09-01
Recently, spectral information is introduced into face recognition applications to improve the detection performance for different conditions. Besides the changes in scale, orientation, and rotation of facial images, expression, occlusion and lighting conditions change the overall appearance of faces and recognition results. To eliminate these difficulties, we introduced a new face recognition technique by using the spectral signature of facial tissues. Unlike alternate algorithms, the proposed algorithm classifies the hyperspectral imagery corresponding to each face into clusters to automatically recognize the desired face and to eliminate the user intervention in the data set. The K-means clustering algorithm is employed to accomplish the clustering and then Mahalanobis distance is computed between the clusters to identify the closest cluster in the data with respect to the reference cluster. By identifying a cluster in the data, the face that contains that cluster is identified by the proposed algorithm. Test results using real life hyperspectral imagery shows the effectiveness of the proposed algorithm.
ASteCA: Automated Stellar Cluster Analysis
NASA Astrophysics Data System (ADS)
Perren, G. I.; Vázquez, R. A.; Piatti, A. E.
2015-04-01
We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.
Nadasdy, Zoltan; Varsanyi, Peter; Zaborszky, Laszlo
2010-01-01
Functionally related groups of neurons spatially cluster together in the brain. To detect groups of functionally related neurons from 3D histological data, we developed an objective clustering method that provides a description of detected cell clusters that is quantitative and amenable to visual exploration. This method is based on bubble clustering (Gupta and Gosh, 2008). Our implementation consists of three steps: (i) an initial data exploration for scanning the clustering parameter space; (ii) determination of the optimal clustering parameters; (iii) final clustering. We designed this algorithm to flexibly detect clusters without assumptions about the underlying cell distribution within a cluster or the number and sizes of clusters. We implemented the clustering function as an integral part of the neuroanatomical data visualization software Virtual RatBrain (http://www.virtualratbrain.org). We applied this algorithm to the basal forebrain cholinergic system, which consists of a diffuse but inhomogeneous population of neurons (Zaborszky, 1992). With this clustering method, we confirmed the inhomogeneity in this system, defined cell clusters, quantified and localized them, and determined the cell density within clusters. Furthermore, by applying the clustering method to multiple specimens from both rat and monkey, we found that cholinergic clusters display remarkable cross-species preservation of cell density within clusters. This method is efficient not only for clustering cell body distributions but may also be used to study other distributed neuronal structural elements, including synapses, receptors, dendritic spines and molecular markers. PMID:20398701
Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold
2014-12-01
In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature.
Overlapping clusters for distributed computation.
Mirrokni, Vahab; Andersen, Reid; Gleich, David F.
2010-11-01
Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initial partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.
CLUM: a cluster program for analyzing microarray data.
Irigoien, I; Fernandez, E; Vives, S; Arenas, C
2008-08-01
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. PMID:18825964
BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.
Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar
2015-06-01
Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.
Cui, Xiaohui; Potok, Thomas E
2006-01-01
The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the Kmeans or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a given set of data through the embedding of the highdimensional data items on a two-dimensional grid for efficient clustering result retrieval and visualization. In this paper, we propose a bio-inspired clustering model, the Multiple Species Flocking clustering model (MSF), and present a distributed multi-agent MSF approach for document clustering.
Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach
NASA Astrophysics Data System (ADS)
Witek, Ewa
The model-based approach assumes that data is generated by a finite mixture of probability distributions such as multivariate normal distributions. In finite mixture models, each component of probability distribution corresponds to a cluster. The problem of determining the number of clusters and choosing an appropriate clustering method becomes the problem of statistical model choice. Hence, the model-based approach provides a key advantage over heuristic clustering algorithms, because it selects both the correct model and the number of clusters.
Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data
Varshavsky, Roy; Horn, David; Linial, Michal
2008-01-01
Background A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. Methodology/Principal Findings We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. Conclusions Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide
Intrusion signature creation via clustering anomalies
NASA Astrophysics Data System (ADS)
Hendry, Gilbert R.; Yang, Shanchieh J.
2008-03-01
Current practices for combating cyber attacks typically use Intrusion Detection Systems (IDSs) to detect and block multistage attacks. Because of the speed and impacts of new types of cyber attacks, current IDSs are limited in providing accurate detection while reliably adapting to new attacks. In signature-based IDS systems, this limitation is made apparent by the latency from day zero of an attack to the creation of an appropriate signature. This work hypothesizes that this latency can be shortened by creating signatures via anomaly-based algorithms. A hybrid supervised and unsupervised clustering algorithm is proposed for new signature creation. These new signatures created in real-time would take effect immediately, ideally detecting new attacks. This work first investigates a modified density-based clustering algorithm as an IDS, with its strengths and weaknesses identified. A signature creation algorithm leveraging the summarizing abilities of clustering is investigated. Lessons learned from the supervised signature creation are then leveraged for the development of unsupervised real-time signature classification. Automating signature creation and classification via clustering is demonstrated as satisfactory but with limitations.
Deterministic algorithm with agglomerative heuristic for location problems
NASA Astrophysics Data System (ADS)
Kazakovtsev, L.; Stupina, A.
2015-10-01
Authors consider the clustering problem solved with the k-means method and p-median problem with various distance metrics. The p-median problem and the k-means problem as its special case are most popular models of the location theory. They are implemented for solving problems of clustering and many practically important logistic problems such as optimal factory or warehouse location, oil or gas wells, optimal drilling for oil offshore, steam generators in heavy oil fields. Authors propose new deterministic heuristic algorithm based on ideas of the Information Bottleneck Clustering and genetic algorithms with greedy heuristic. In this paper, results of running new algorithm on various data sets are given in comparison with known deterministic and stochastic methods. New algorithm is shown to be significantly faster than the Information Bottleneck Clustering method having analogous preciseness.
SPECTRAL IMAGING OF GALAXY CLUSTERS WITH PLANCK
Bourdin, H.; Mazzotta, P.; Rasia, E.
2015-12-20
The Sunyaev–Zeldovich (SZ) effect is a promising tool for detecting the presence of hot gas out to the galaxy cluster peripheries. We developed a spectral imaging algorithm dedicated to the SZ observations of nearby galaxy clusters with Planck, with the aim of revealing gas density anisotropies related to the filamentary accretion of materials, or pressure discontinuities induced by the propagation of shock fronts. To optimize an unavoidable trade-off between angular resolution and precision of the SZ flux measurements, the algorithm performs a multi-scale analysis of the SZ maps as well as of other extended components, such as the cosmic microwave background (CMB) anisotropies and the Galactic thermal dust. The demixing of the SZ signal is tackled through kernel-weighted likelihood maximizations. The CMB anisotropies are further analyzed through a wavelet analysis, while the Galactic foregrounds and SZ maps are analyzed via a curvelet analysis that best preserves their anisotropic details. The algorithm performance has been tested against mock observations of galaxy clusters obtained by simulating the Planck High Frequency Instrument and by pointing at a few characteristic positions in the sky. These tests suggest that Planck should easily allow us to detect filaments in the cluster peripheries and detect large-scale shocks in colliding galaxy clusters that feature favorable geometry.
Application of Simulated Annealing to Clustering Tuples in Databases.
ERIC Educational Resources Information Center
Bell, D. A.; And Others
1990-01-01
Investigates the value of applying principles derived from simulated annealing to clustering tuples in database design, and compares this technique with a graph-collapsing clustering method. It is concluded that, while the new method does give superior results, the expense involved in algorithm run time is prohibitive. (24 references) (CLB)
A Distributed Flocking Approach for Information Stream Clustering Analysis
Cui, Xiaohui; Potok, Thomas E
2006-01-01
Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.
Recent Trends in Hierarchic Document Clustering: A Critical Review.
ERIC Educational Resources Information Center
Willett, Peter
1988-01-01
Reviews recent research into the use of hierarchic agglomerative clustering methods for document retrieval. The topics discussed include the calculation of interdocument similarities, algorithms used to implement clustering methods on large databases, validity testing of document hierarchies, appropriate search strategies, and other applications…
Identification of chronic rhinosinusitis phenotypes using cluster analysis
Soler, Zachary M.; Hyer, J. Madison; Ramakrishnan, Viswanathan; Smith, Timothy L.; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J.
2015-01-01
Introduction Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. Methods A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the SinoNasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Results Clustering was largely determined by age, severity of patient reported outcome measures, depression and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures including polyp/atopic status, prior surgery, B-SIT and asthma did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. Discussion A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. PMID:25694390
The Swift AGN and Cluster Survey. II. Cluster Confirmation with SDSS Data
NASA Astrophysics Data System (ADS)
Griffin, Rhiannon D.; Dai, Xinyu; Kochanek, Christopher S.; Bregman, Joel N.
2016-01-01
We study 203 (of 442) Swift AGN and Cluster Survey extended X-ray sources located in the SDSS DR8 footprint to search for galaxy over-densities in three-dimensional space using SDSS galaxy photometric redshifts and positions near the Swift cluster candidates. We find 104 Swift clusters with a >3σ galaxy over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmation as galaxy clusters. We present a series of cluster properties including the redshift, brightest cluster galaxy (BCG) magnitude, BCG-to-X-ray center offset, optical richness, and X-ray luminosity. We also detect red sequences in ˜85% of the 104 confirmed clusters. The X-ray luminosity and optical richness for the SDSS confirmed Swift clusters are correlated and follow previously established relations. The distribution of the separations between the X-ray centroids and the most likely BCG is also consistent with expectation. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≲ 0.3 and is still 80% complete up to z ≃ 0.4, consistent with the SDSS survey depth. These analysis results suggest that our Swift cluster selection algorithm has yielded a statistically well-defined cluster sample for further study of cluster evolution and cosmology. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 23, and 1 matches in optical, X-ray, and Sunyaev-Zel’dovich catalogs, respectively, and so the majority of these clusters are new detections.
NASA Astrophysics Data System (ADS)
Lange, Oliver; Meyer-Baese, Anke; Wismuller, Axel; Hurdal, Monica
2005-03-01
We employ unsupervised clustering techniques for the analysis of dynamic contrast-enhanced perfusion MRI time-series in patients with and without stroke. "Neural gas" network, fuzzy clustering based on deterministic annealing, self-organizing maps, and fuzzy c-means clustering enable self-organized data-driven segmentation w.r.t.fine-grained differences of signal amplitude and dynamics, thus identifying asymmetries and local abnormalities of brain perfusion. We conclude that clustering is a useful extension to conventional perfusion parameter maps.
The Voronoi Tessellation Cluster Finder in 2 1 Dimensions
Soares-Santos, Marcelle; de Carvalho, Reinaldo R.; Annis, James; Gal, Roy R.; La Barbera, Francesco; Lopes, Paulo A.A.; Wechsler, Risa H.; Busha, Michael T.; Gerke, Brian F.; /SLAC /KIPAC, Menlo Park
2011-06-23
We present a detailed description of the Voronoi Tessellation (VT) cluster finder algorithm in 2+1 dimensions, which improves on past implementations of this technique. The need for cluster finder algorithms able to produce reliable cluster catalogs up to redshift 1 or beyond and down to 10{sup 13.5} solar masses is paramount especially in light of upcoming surveys aiming at cosmological constraints from galaxy cluster number counts. We build the VT in photometric redshift shells and use the two-point correlation function of the galaxies in the field to both determine the density threshold for detection of cluster candidates and to establish their significance. This allows us to detect clusters in a self-consistent way without any assumptions about their astrophysical properties. We apply the VT to mock catalogs which extend to redshift 1.4 reproducing the ?CDM cosmology and the clustering properties observed in the Sloan Digital Sky Survey data. An objective estimate of the cluster selection function in terms of the completeness and purity as a function of mass and redshift is as important as having a reliable cluster finder. We measure these quantities by matching the VT cluster catalog with the mock truth table. We show that the VT can produce a cluster catalog with completeness and purity >80% for the redshift range up to {approx}1 and mass range down to {approx}10{sup 13.5} solar masses.
Semi-Supervised Kernel Mean Shift Clustering.
Anand, Saket; Mittal, Sushil; Tuzel, Oncel; Meer, Peter
2014-06-01
Mean shift clustering is a powerful nonparametric technique that does not require prior knowledge of the number of clusters and does not constrain the shape of the clusters. However, being completely unsupervised, its performance suffers when the original distance metric fails to capture the underlying cluster structure. Despite recent advances in semi-supervised clustering methods, there has been little effort towards incorporating supervision into mean shift. We propose a semi-supervised framework for kernel mean shift clustering (SKMS) that uses only pairwise constraints to guide the clustering procedure. The points are first mapped to a high-dimensional kernel space where the constraints are imposed by a linear transformation of the mapped points. This is achieved by modifying the initial kernel matrix by minimizing a log det divergence-based objective function. We show the advantages of SKMS by evaluating its performance on various synthetic and real datasets while comparing with state-of-the-art semi-supervised clustering algorithms. PMID:26353281
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented. PMID:17968118
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Wu, Chuanli; Gao, Yuexia; Hua, Tianqi; Xu, Chenwu
2016-01-01
Background It is challenging to deal with mixture models when missing values occur in clustering datasets. Methods and Results We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a “pseudo-complete” dataset. Parameters from different clusters and missing values are estimated according to the maximum likelihood implemented with an expectation-maximization algorithm, and multivariate individuals are clustered with Bayesian posterior probability. A simulation showed that our proposed method has a fast convergence speed and it accurately estimates missing values. Our proposed algorithm was further validated with Fisher’s Iris dataset, the Yeast Cell-cycle Gene-expression dataset, and the CIFAR-10 images dataset. The results indicate that our algorithm offers highly accurate clustering, comparable to that using a complete dataset without missing values. Furthermore, our algorithm resulted in a lower misjudgment rate than both clustering algorithms with missing data deleted and with missing-value imputation by mean replacement. Conclusion We demonstrate that our missing-value imputation clustering algorithm is feasible and superior to both of these other clustering algorithms in certain situations. PMID:27552203
Segmentation of clustered nuclei based on concave curve expansion.
Zhang, C; Sun, C; Pham, T D
2013-07-01
Segmentation of nuclei from images of tissue sections is important for many biological and biomedical studies. Many existing image segmentation algorithms may lead to oversegmentation or undersegmentation for clustered nuclei images. In this paper, we proposed a new image segmentation algorithm based on concave curve expansion to correctly and accurately extract markers from the original images. Marker-controlled watershed is then used to segment the clustered nuclei. The algorithm was tested on both synthetic and real images and better results are achieved compared with some other state-of-the-art methods.
Cloud Computing Application for Hotspot Clustering Using Recursive Density Based Clustering (RDBC)
NASA Astrophysics Data System (ADS)
Santoso, Aries; Khiyarin Nisa, Karlina
2016-01-01
Indonesia has vast areas of tropical forest, but are often burned which causes extensive damage to property and human life. Monitoring hotspots can be one of the forest fire management. Each hotspot is recorded in dataset so that it can be processed and analyzed. This research aims to build a cloud computing application which visualizes hotspots clustering. This application uses the R programming language with Shiny web framework and implements Recursive Density Based Clustering (RDBC) algorithm. Clustering is done on hotspot dataset of the Kalimantan Island and South Sumatra Province to find the spread pattern of hotspots. The clustering results are evaluated using the Silhouette's Coefficient (SC) which yield best value 0.3220798 for Kalimantan dataset. Clustering pattern are displayed in the form of web pages so that it can be widely accessed and become the reference for fire occurrence prediction.
Model-based clustered-dot screening
NASA Astrophysics Data System (ADS)
Kim, Sang Ho
2006-01-01
I propose a halftone screen design method based on a human visual system model and the characteristics of the electro-photographic (EP) printer engine. Generally, screen design methods based on human visual models produce dispersed-dot type screens while design methods considering EP printer characteristics generate clustered-dot type screens. In this paper, I propose a cost function balancing the conflicting characteristics of the human visual system and the printer. By minimizing the obtained cost function, I design a model-based clustered-dot screen using a modified direct binary search algorithm. Experimental results demonstrate the superior quality of the model-based clustered-dot screen compared to a conventional clustered-dot screen.
Semiparametric binary model for clustered survival data
NASA Astrophysics Data System (ADS)
Arlin, Rifina; Ibrahim, Noor Akma; Arasan, Jayanthi; Bakar, Rizam Abu
2015-10-01
This paper considers a method to analyze semiparametric binary models for clustered survival data when the responses are correlated. We extend parametric generalized estimating equation (GEE) to semiparametric GEE by introducing smoothing spline into the model. A backfitting algorithm is used in the derivation of the estimating equation for the parametric and nonparametric components of a semiparametric binary covariate model. The properties of the estimates for both are evaluated using simulation studies. We investigated the effects of the strength of cluster correlation and censoring rates on properties of the parameters estimate. The effect of the number of clusters and cluster size are also discussed. Results show that the GEE-SS are consistent and efficient for parametric component and nonparametric component of semiparametric binary covariates.