distributed clustering algorithm: Topics by Science.gov

Sample records for distributed clustering algorithm

Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.

PubMed

Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong

2017-07-03

Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.
Multimodal Estimation of Distribution Algorithms.

PubMed

Yang, Qiang; Chen, Wei-Neng; Li, Yun; Chen, C L Philip; Xu, Xiang-Min; Zhang, Jun

2016-02-15

Taking the advantage of estimation of distribution algorithms (EDAs) in preserving high diversity, this paper proposes a multimodal EDA. Integrated with clustering strategies for crowding and speciation, two versions of this algorithm are developed, which operate at the niche level. Then these two algorithms are equipped with three distinctive techniques: 1) a dynamic cluster sizing strategy; 2) an alternative utilization of Gaussian and Cauchy distributions to generate offspring; and 3) an adaptive local search. The dynamic cluster sizing affords a potential balance between exploration and exploitation and reduces the sensitivity to the cluster size in the niching methods. Taking advantages of Gaussian and Cauchy distributions, we generate the offspring at the niche level through alternatively using these two distributions. Such utilization can also potentially offer a balance between exploration and exploitation. Further, solution accuracy is enhanced through a new local search scheme probabilistically conducted around seeds of niches with probabilities determined self-adaptively according to fitness values of these seeds. Extensive experiments conducted on 20 benchmark multimodal problems confirm that both algorithms can achieve competitive performance compared with several state-of-the-art multimodal algorithms, which is supported by nonparametric tests. Especially, the proposed algorithms are very promising for complex problems with many local optima.
Research on retailer data clustering algorithm based on Spark

NASA Astrophysics Data System (ADS)

Huang, Qiuman; Zhou, Feng

2017-03-01

Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Energy Aware Clustering Algorithms for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian

2011-09-01

The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Cluster compression algorithm: A joint clustering/data compression concept

NASA Technical Reports Server (NTRS)

Hilbert, E. E.

1977-01-01

The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Load Balancing in Distributed Web Caching: A Novel Clustering Approach

NASA Astrophysics Data System (ADS)

Tiwari, R.; Kumar, K.; Khan, G.

2010-11-01

The World Wide Web suffers from scaling and reliability problems due to overloaded and congested proxy servers. Caching at local proxy servers helps, but cannot satisfy more than a third to half of requests; more requests are still sent to original remote origin servers. In this paper we have developed an algorithm for Distributed Web Cache, which incorporates cooperation among proxy servers of one cluster. This algorithm uses Distributed Web Cache concepts along with static hierarchies with geographical based clusters of level one proxy server with dynamic mechanism of proxy server during the congestion of one cluster. Congestion and scalability problems are being dealt by clustering concept used in our approach. This results in higher hit ratio of caches, with lesser latency delay for requested pages. This algorithm also guarantees data consistency between the original server objects and the proxy cache objects.
Noise-enhanced clustering and competitive learning algorithms.

PubMed

Osoba, Osonde; Kosko, Bart

2013-01-01

Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning. Copyright © 2012 Elsevier Ltd. All rights reserved.
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

PubMed Central

Poole, William; Leinonen, Kalle; Shmulevich, Ilya

2017-01-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

PubMed

Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

2017-02-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Robust MST-Based Clustering Algorithm.

PubMed

Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

2018-06-01

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Basic firefly algorithm for document clustering

NASA Astrophysics Data System (ADS)

Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza

2015-12-01

The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
A Novel Energy-Aware Distributed Clustering Algorithm for Heterogeneous Wireless Sensor Networks in the Mobile Environment

PubMed Central

Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong

2015-01-01

In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Support Distribution Machines

NASA Astrophysics Data System (ADS)

Ntampaka, Michelle; Trac, Hy; Sutherland, Dougal; Fromenteau, Sebastien; Poczos, Barnabas; Schneider, Jeff

2018-01-01

We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning (ML) algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark’s publicly available N-body MDPL1 simulation, one with perfect galaxy cluster membership infor- mation and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width E=0.87. Interlopers introduce additional scatter, significantly widening the error distribution further (E=2.13). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement (E=0.67) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncon- taminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
Overlapping clusters for distributed computation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mirrokni, Vahab; Andersen, Reid; Gleich, David F.

2010-11-01

Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initialmore » partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.« less
GDPC: Gravitation-based Density Peaks Clustering algorithm

NASA Astrophysics Data System (ADS)

Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

2018-07-01

The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

PubMed

He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

2017-10-01

It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.
Distributed k-Means Algorithm and Fuzzy c-Means Algorithm for Sensor Networks Based on Multiagent Consensus Theory.

PubMed

Qin, Jiahu; Fu, Weiming; Gao, Huijun; Zheng, Wei Xing

2016-03-03

This paper is concerned with developing a distributed k-means algorithm and a distributed fuzzy c-means algorithm for wireless sensor networks (WSNs) where each node is equipped with sensors. The underlying topology of the WSN is supposed to be strongly connected. The consensus algorithm in multiagent consensus theory is utilized to exchange the measurement information of the sensors in WSN. To obtain a faster convergence speed as well as a higher possibility of having the global optimum, a distributed k-means++ algorithm is first proposed to find the initial centroids before executing the distributed k-means algorithm and the distributed fuzzy c-means algorithm. The proposed distributed k-means algorithm is capable of partitioning the data observed by the nodes into measure-dependent groups which have small in-group and large out-group distances, while the proposed distributed fuzzy c-means algorithm is capable of partitioning the data observed by the nodes into different measure-dependent groups with degrees of membership values ranging from 0 to 1. Simulation results show that the proposed distributed algorithms can achieve almost the same results as that given by the centralized clustering algorithms.

A hybrid monkey search algorithm for clustering analysis.

PubMed

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

NASA Astrophysics Data System (ADS)

Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

2017-03-01

DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Self-organization and clustering algorithms

NASA Technical Reports Server (NTRS)

Bezdek, James C.

1991-01-01

Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Algorithms of maximum likelihood data clustering with applications

NASA Astrophysics Data System (ADS)

Giada, Lorenzo; Marsili, Matteo

2002-12-01

We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
Android Malware Classification Using K-Means Clustering Algorithm

NASA Astrophysics Data System (ADS)

Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

2017-08-01

Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

PubMed Central

Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

PubMed

Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Node Self-Deployment Algorithm Based on an Uneven Cluster with Radius Adjusting for Underwater Sensor Networks

PubMed Central

Jiang, Peng; Xu, Yiming; Wu, Feng

2016-01-01

Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Online clustering algorithms for radar emitter classification.

PubMed

Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

2005-08-01

Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

NASA Technical Reports Server (NTRS)

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Clustering algorithm for determining community structure in large networks

NASA Astrophysics Data System (ADS)

Pujol, Josep M.; Béjar, Javier; Delgado, Jordi

2006-07-01

We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
An AK-LDMeans algorithm based on image clustering

NASA Astrophysics Data System (ADS)

Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

2018-03-01

Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
Random Walk Quantum Clustering Algorithm Based on Space

NASA Astrophysics Data System (ADS)

Xiao, Shufen; Dong, Yumin; Ma, Hongyang

2018-01-01

In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms

NASA Technical Reports Server (NTRS)

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

2006-01-01

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering

PubMed Central

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

PubMed

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Mining the National Career Assessment Examination Result Using Clustering Algorithm

NASA Astrophysics Data System (ADS)

Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

2018-03-01

Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Textural defect detect using a revised ant colony clustering algorithm

NASA Astrophysics Data System (ADS)

Zou, Chao; Xiao, Li; Wang, Bingwen

2007-11-01

We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
An improved clustering algorithm based on reverse learning in intelligent transportation

NASA Astrophysics Data System (ADS)

Qiu, Guoqing; Kou, Qianqian; Niu, Ting

2017-05-01

With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Collaborative filtering recommendation model based on fuzzy clustering algorithm

NASA Astrophysics Data System (ADS)

Yang, Ye; Zhang, Yunhua

2018-05-01

As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.

A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Chahine, Firas Safwan

2012-01-01

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

PubMed

Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

2009-06-01

The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
A fast parallel clustering algorithm for molecular simulation trajectories.

PubMed

Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui

2013-01-15

We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
A novel complex networks clustering algorithm based on the core influence of nodes.

PubMed

Tong, Chao; Niu, Jianwei; Dai, Bin; Xie, Zhongyu

2014-01-01

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.

PubMed

Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg

2017-11-03

In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.
A roadmap of clustering algorithms: finding a match for a biomedical application.

PubMed

Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

2009-05-01

Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

PubMed

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
Clustering performance comparison using K-means and expectation maximization algorithms.

PubMed

Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

2014-11-14

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

PubMed Central

Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin

2018-01-01

Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

DOE PAGES

Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...

2018-01-05

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

PubMed

Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

2016-12-01

This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
A clustering algorithm for determining community structure in complex networks

NASA Astrophysics Data System (ADS)

Jin, Hong; Yu, Wei; Li, ShiJun

2018-02-01

Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
A similarity based agglomerative clustering algorithm in networks

NASA Astrophysics Data System (ADS)

Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

2018-04-01

The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.
A new clustering algorithm applicable to multispectral and polarimetric SAR images

NASA Technical Reports Server (NTRS)

Wong, Yiu-Fai; Posner, Edward C.

1993-01-01

We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm

PubMed Central

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
The distribution of early- and late-type galaxies in the Coma cluster

NASA Technical Reports Server (NTRS)

Doi, M.; Fukugita, M.; Okamura, S.; Turner, E. L.

1995-01-01

The spatial distribution and the morohology-density relation of Coma cluster galaxies are studied using a new homogeneous photmetric sample of 450 galaxies down to B = 16.0 mag with quantitative morphology classification. The sample covers a wide area (10 deg X 10 deg), extending well beyond the Coma cluster. Morphological classifications into early- (E+SO) and late-(S) type galaxies are made by an automated algorithm using simple photometric parameters, with which the misclassification rate is expected to be approximately 10% with respect to early and late types given in the Third Reference Catalogue of Bright Galaxies. The flattened distribution of Coma cluster galaxies, as noted in previous studies, is most conspicuously seen if the early-type galaxies are selected. Early-type galaxies are distributed in a thick filament extended from the NE to the WSW direction that delineates a part of large-scale structure. Spiral galaxies show a distribution with a modest density gradient toward the cluster center; at least bright spiral galaxies are present close to the center of the Coma cluster. We also examine the morphology-density relation for the Coma cluster including its surrounding regions.
Automatic Regionalization Algorithm for Distributed State Estimation in Power Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Dexin; Yang, Liuqing; Florita, Anthony

The deregulation of the power system and the incorporation of generation from renewable energy sources recessitates faster state estimation in the smart grid. Distributed state estimation (DSE) has become a promising and scalable solution to this urgent demand. In this paper, we investigate the regionalization algorithms for the power system, a necessary step before distributed state estimation can be performed. To the best of the authors' knowledge, this is the first investigation on automatic regionalization (AR). We propose three spectral clustering based AR algorithms. Simulations show that our proposed algorithms outperform the two investigated manual regionalization cases. With the helpmore » of AR algorithms, we also show how the number of regions impacts the accuracy and convergence speed of the DSE and conclude that the number of regions needs to be chosen carefully to improve the convergence speed of DSEs.« less
Hierarchical trie packet classification algorithm based on expectation-maximization clustering

PubMed Central

Bi, Xia-an; Zhao, Junxia

2017-01-01

With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.

PubMed

Bi, Xia-An; Zhao, Junxia

2017-01-01

With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.

A Fast Implementation of the ISODATA Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2005-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.

2007-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means

NASA Astrophysics Data System (ADS)

Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.

2018-04-01

This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
An algorithm for spatial heirarchy clustering

NASA Technical Reports Server (NTRS)

Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.

1981-01-01

A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Study of parameters of the nearest neighbour shared algorithm on clustering documents

NASA Astrophysics Data System (ADS)

Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

2018-03-01

Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well
Depth data research of GIS based on clustering analysis algorithm

NASA Astrophysics Data System (ADS)

Xiong, Yan; Xu, Wenli

2018-03-01

The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.
Soft learning vector quantization and clustering algorithms based on ordered weighted aggregation operators.

PubMed

Karayiannis, N B

2000-01-01

This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

PubMed

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms

ERIC Educational Resources Information Center

Salman, Raied

2012-01-01

The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

PubMed Central

Ahmed, Zakir Hussain

2014-01-01

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
Automatic Regionalization Algorithm for Distributed State Estimation in Power Systems: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Dexin; Yang, Liuqing; Florita, Anthony

The deregulation of the power system and the incorporation of generation from renewable energy sources recessitates faster state estimation in the smart grid. Distributed state estimation (DSE) has become a promising and scalable solution to this urgent demand. In this paper, we investigate the regionalization algorithms for the power system, a necessary step before distributed state estimation can be performed. To the best of the authors' knowledge, this is the first investigation on automatic regionalization (AR). We propose three spectral clustering based AR algorithms. Simulations show that our proposed algorithms outperform the two investigated manual regionalization cases. With the helpmore » of AR algorithms, we also show how the number of regions impacts the accuracy and convergence speed of the DSE and conclude that the number of regions needs to be chosen carefully to improve the convergence speed of DSEs.« less
A fuzzy clustering algorithm to detect planar and quadric shapes

NASA Technical Reports Server (NTRS)

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

PubMed

Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

2017-09-01

Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.

PubMed

Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong

2015-12-01

Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Using experimental data to test an n -body dynamical model coupled with an energy-based clusterization algorithm at low incident energies

NASA Astrophysics Data System (ADS)

Kumar, Rohit; Puri, Rajeev K.

2018-03-01

Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
A novel artificial immune algorithm for spatial clustering with obstacle constraint and its applications.

PubMed

Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

2014-01-01

An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.

PubMed

Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B

2011-08-15

Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Distribution-based fuzzy clustering of electrical resistivity tomography images for interface detection

NASA Astrophysics Data System (ADS)

Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

2014-04-01

A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms

PubMed Central

He, Li; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions. PMID:29123546

On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.

PubMed

Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging

NASA Astrophysics Data System (ADS)

Liu, Feng; Ye, Chengcheng; Zhu, Erzhou

2017-09-01

Due to the advent of big data, data mining technology has attracted more and more attentions. As an important data analysis method, grid clustering algorithm is fast but with relatively lower accuracy. This paper presents an improved clustering algorithm combined with grid and density parameters. The algorithm first divides the data space into the valid meshes and invalid meshes through grid parameters. Secondly, from the starting point located at the first point of the diagonal of the grids, the algorithm takes the direction of “horizontal right, vertical down” to merge the valid meshes. Furthermore, by the boundary grid processing, the invalid grids are searched and merged when the adjacent left, above, and diagonal-direction grids are all the valid ones. By doing this, the accuracy of clustering is improved. The experimental results have shown that the proposed algorithm is accuracy and relatively faster when compared with some popularly used algorithms.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data

PubMed Central

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.

PubMed

Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.
An adaptive clustering algorithm for image matching based on corner feature

NASA Astrophysics Data System (ADS)

Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

2018-04-01

The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.
Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

PubMed

Anandakrishnan, Ramu; Onufriev, Alexey

2008-03-01

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Chaotic map clustering algorithm for EEG analysis

NASA Astrophysics Data System (ADS)

Bellotti, R.; De Carlo, F.; Stramaglia, S.

2004-03-01

The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Location and Size Planning of Distributed Photovoltaic Generation in Distribution network System Based on K-means Clustering Analysis

NASA Astrophysics Data System (ADS)

Lu, Siqi; Wang, Xiaorong; Wu, Junyong

2018-01-01

The paper presents a method to generate the planning scenarios, which is based on K-means clustering analysis algorithm driven by data, for the location and size planning of distributed photovoltaic (PV) units in the network. Taken the power losses of the network, the installation and maintenance costs of distributed PV, the profit of distributed PV and the voltage offset as objectives and the locations and sizes of distributed PV as decision variables, Pareto optimal front is obtained through the self-adaptive genetic algorithm (GA) and solutions are ranked by a method called technique for order preference by similarity to an ideal solution (TOPSIS). Finally, select the planning schemes at the top of the ranking list based on different planning emphasis after the analysis in detail. The proposed method is applied to a 10-kV distribution network in Gansu Province, China and the results are discussed.
Towards enhancement of performance of K-means clustering using nature-inspired optimization algorithms.

PubMed

Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

PubMed Central

Dawson, Kevin J.; Belkhir, Khalid

2009-01-01

Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

PubMed

Kamali, Tahereh; Stashuk, Daniel

2016-10-01

Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright �
Security clustering algorithm based on reputation in hierarchical peer-to-peer network

NASA Astrophysics Data System (ADS)

Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji

2013-03-01

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Srivastava, Ashok N.

2009-01-01

This paper offers a local distributed algorithm for expectation maximization in large peer-to-peer environments. The algorithm can be used for a variety of well-known data mining tasks in a distributed environment such as clustering, anomaly detection, target tracking to name a few. This technology is crucial for many emerging peer-to-peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Centralizing all or some of the data for building global models is impractical in such peer-to-peer environments because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, and dynamic nature of the data/network. The distributed algorithm we have developed in this paper is provably-correct i.e. it converges to the same result compared to a similar centralized algorithm and can automatically adapt to changes to the data and the network. We show that the communication overhead of the algorithm is very low due to its local nature. This monitoring algorithm is then used as a feedback loop to sample data from the network and rebuild the model when it is outdated. We present thorough experimental results to verify our theoretical claims.
A scalable and practical one-pass clustering algorithm for recommender system

NASA Astrophysics Data System (ADS)

Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali

2015-12-01

KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

PubMed

Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

2018-05-04

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.

PubMed

Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar

2017-03-01

This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

PubMed

Lukashin, A V; Fuchs, R

2001-05-01

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
A novel harmony search-K means hybrid algorithm for clustering gene expression data

PubMed Central

Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu

2013-01-01

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.

PubMed

Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu

2013-01-01

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

NASA Astrophysics Data System (ADS)

Frisca, Bustamam, Alhadi; Siswantining, Titin

2017-03-01

Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.

Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

PubMed Central

Deb, Suash; Yang, Xin-She

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.

PubMed

Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A

2012-02-01

Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method.
An improved initialization center k-means clustering algorithm based on distance and density

NASA Astrophysics Data System (ADS)

Duan, Yanling; Liu, Qun; Xia, Shuyin

2018-04-01

Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Distributed Multihop Clustering Approach for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Israr, Nauman; Awan, Irfan

Prolonging the life time of Wireless Sensor Networks (WSNs) has been the focus of current research. One of the issues that needs to be addressed along with prolonging the network life time is to ensure uniform energy consumption across the network in WSNs especially in case of random network deployment. Cluster based routing algorithms are believed to be the best choice for WSNs because they work on the principle of divide and conquer and also improve the network life time considerably compared to flat based routing schemes. In this paper we propose a new routing strategy based on two layers clustering which exploits the redundancy property of the network in order to minimise duplicate data transmission and also make the intercluster and intracluster communication multihop. The proposed algorithm makes use of the nodes in a network whose area coverage is covered by the neighbouring nodes. These nodes are marked as temporary cluster heads and later use these temporary cluster heads randomly for multihop intercluster communication. Performance studies indicate that the proposed algorithm solves effectively the problem of load balancing across the network and is more energy efficient compared to the enhanced version of widely used Leach algorithm.
The application of mixed recommendation algorithm with user clustering in the microblog advertisements promotion

NASA Astrophysics Data System (ADS)

Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

2017-03-01

The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.

DTIC Science & Technology

A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the
The ROSAT Brightest Cluster Sample - I. The compilation of the sample and the cluster log N-log S distribution

NASA Astrophysics Data System (ADS)

Ebeling, H.; Edge, A. C.; Bohringer, H.; Allen, S. W.; Crawford, C. S.; Fabian, A. C.; Voges, W.; Huchra, J. P.

1998-12-01

We present a 90 per cent flux-complete sample of the 201 X-ray-brightest clusters of galaxies in the northern hemisphere (delta>=0 deg), at high Galactic latitudes (|b|>=20 deg), with measured redshifts z<=0.3 and fluxes higher than 4.4x10^-12 erg cm^-2 s^-1 in the 0.1-2.4 keV band. The sample, called the ROSAT Brightest Cluster Sample (BCS), is selected from ROSAT All-Sky Survey data and is the largest X-ray-selected cluster sample compiled to date. In addition to Abell clusters, which form the bulk of the sample, the BCS also contains the X-ray-brightest Zwicky clusters and other clusters selected from their X-ray properties alone. Effort has been made to ensure the highest possible completeness of the sample and the smallest possible contamination by non-cluster X-ray sources. X-ray fluxes are computed using an algorithm tailored for the detection and characterization of X-ray emission from galaxy clusters. These fluxes are accurate to better than 15 per cent (mean 1sigma error). We find the cumulative logN-logS distribution of clusters to follow a power law kappa S^alpha with alpha=1.31^+0.06_-0.03 (errors are the 10th and 90th percentiles) down to fluxes of 2x10^-12 erg cm^-2 s^-1, i.e. considerably below the BCS flux limit. Although our best-fitting slope disagrees formally with the canonical value of -1.5 for a Euclidean distribution, the BCS logN-logS distribution is consistent with a non-evolving cluster population if cosmological effects are taken into account. Our sample will allow us to examine large-scale structure in the northern hemisphere, determine the spatial cluster-cluster correlation function, investigate correlations between the X-ray and optical properties of the clusters, establish the X-ray luminosity function for galaxy clusters, and discuss the implications of the results for cluster evolution.
ABCluster: the artificial bee colony algorithm for cluster global optimization.

PubMed

Zhang, Jun; Dolg, Michael

2015-10-07

Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters.
Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

NASA Astrophysics Data System (ADS)

Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

2018-01-01

To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.
A fast density-based clustering algorithm for real-time Internet of Things stream.

PubMed

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Are judgments a form of data clustering? Reexamining contrast effects with the k-means algorithm.

PubMed

Boillaud, Eric; Molina, Guylaine

2015-04-01

A number of theories have been proposed to explain in precise mathematical terms how statistical parameters and sequential properties of stimulus distributions affect category ratings. Various contextual factors such as the mean, the midrange, and the median of the stimuli; the stimulus range; the percentile rank of each stimulus; and the order of appearance have been assumed to influence judgmental contrast. A data clustering reinterpretation of judgmental relativity is offered wherein the influence of the initial choice of centroids on judgmental contrast involves 2 combined frequency and consistency tendencies. Accounts of the k-means algorithm are provided, showing good agreement with effects observed on multiple distribution shapes and with a variety of interaction effects relating to the number of stimuli, the number of response categories, and the method of skewing. Experiment 1 demonstrates that centroid initialization accounts for contrast effects obtained with stretched distributions. Experiment 2 demonstrates that the iterative convergence inherent to the k-means algorithm accounts for the contrast reduction observed across repeated blocks of trials. The concept of within-cluster variance minimization is discussed, as is the applicability of a backward k-means calculation method for inferring, from empirical data, the values of the centroids that would serve as a representation of the judgmental context. (c) 2015 APA, all rights reserved.
Cluster galaxy population evolution from the Subaru Hyper Suprime-Cam survey: brightest cluster galaxies, stellar mass distribution, and active galaxies

NASA Astrophysics Data System (ADS)

Lin, Yen-Ting; Hsieh, Bau-Ching; Lin, Sheng-Chieh; Oguri, Masamune; Chen, Kai-Feng; Tanaka, Masayuki; Chiu, I.-non; Huang, Song; Kodama, Tadayuki; Leauthaud, Alexie; More, Surhud; Nishizawa, Atsushi J.; Bundy, Kevin; Lin, Lihwai; Miyazaki, Satoshi; HSC Collaboration

2018-01-01

The unprecedented depth and area surveyed by the Subaru Strategic Program with the Hyper Suprime-Cam (HSC-SSP) have enabled us to construct and publish the largest distant cluster sample out to z~1 to date. In this exploratory study of cluster galaxy evolution from z=1 to z=0.3, we investigate the stellar mass assembly history of brightest cluster galaxies (BCGs), and evolution of stellar mass and luminosity distributions, stellar mass surface density profile, as well as the population of radio galaxies. Our analysis is the first high redshift application of the top N richest cluster selection, which is shown to allow us to trace the cluster galaxy evolution faithfully. Our stellar mass is derived from a machine-learning algorithm, which we show to be unbiased and accurate with respect to the COSMOS data. We find very mild stellar mass growth in BCGs, and no evidence for evolution in both the total stellar mass-cluster mass correlation and the shape of the stellar mass surface density profile. The clusters are found to contain more red galaxies compared to the expectations from the field, even after the differences in density between the two environments have been taken into account. We also present the first measurement of the radio luminosity distribution in clusters out to z~1.
A Clustering Algorithm for Ecological Stream Segment Identification from Spatially Extensive Digital Databases

NASA Astrophysics Data System (ADS)

Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.

2005-05-01

Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
Performance analysis of unsupervised optimal fuzzy clustering algorithm for MRI brain tumor segmentation.

PubMed

Blessy, S A Praylin Selva; Sulochana, C Helen

2015-01-01

Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks

PubMed Central

Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip

2013-01-01

Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Distributed cluster management techniques for unattended ground sensor networks

NASA Astrophysics Data System (ADS)

Essawy, Magdi A.; Stelzig, Chad A.; Bevington, James E.; Minor, Sharon

2005-05-01

Smart Sensor Networks are becoming important target detection and tracking tools. The challenging problems in such networks include the sensor fusion, data management and communication schemes. This work discusses techniques used to distribute sensor management and multi-target tracking responsibilities across an ad hoc, self-healing cluster of sensor nodes. Although miniaturized computing resources possess the ability to host complex tracking and data fusion algorithms, there still exist inherent bandwidth constraints on the RF channel. Therefore, special attention is placed on the reduction of node-to-node communications within the cluster by minimizing unsolicited messaging, and distributing the sensor fusion and tracking tasks onto local portions of the network. Several challenging problems are addressed in this work including track initialization and conflict resolution, track ownership handling, and communication control optimization. Emphasis is also placed on increasing the overall robustness of the sensor cluster through independent decision capabilities on all sensor nodes. Track initiation is performed using collaborative sensing within a neighborhood of sensor nodes, allowing each node to independently determine if initial track ownership should be assumed. This autonomous track initiation prevents the formation of duplicate tracks while eliminating the need for a central "management" node to assign tracking responsibilities. Track update is performed as an ownership node requests sensor reports from neighboring nodes based on track error covariance and the neighboring nodes geo-positional location. Track ownership is periodically recomputed using propagated track states to determine which sensing node provides the desired coverage characteristics. High fidelity multi-target simulation results are presented, indicating the distribution of sensor management and tracking capabilities to not only reduce communication bandwidth consumption, but to also
Density-based cluster algorithms for the identification of core sets

NASA Astrophysics Data System (ADS)

Lemke, Oliver; Keller, Bettina G.

2016-10-01

The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

PubMed Central

Wang, Zhihao; Yi, Jing

2016-01-01

For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

PubMed Central

Ying Wah, Teh

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
Clustering algorithm evaluation and the development of a replacement for procedure 1. [for crop inventories

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Johnson, J. K.

1979-01-01

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.

A cluster pattern algorithm for the analysis of multiparametric cell assays.

PubMed

Kaufman, Menachem; Bloch, David; Zurgil, Naomi; Shafran, Yana; Deutsch, Mordechai

2005-09-01

The issue of multiparametric analysis of complex single cell assays of both static and flow cytometry (SC and FC, respectively) has become common in recent years. In such assays, the analysis of changes, applying common statistical parameters and tests, often fails to detect significant differences between the investigated samples. The cluster pattern similarity (CPS) measure between two sets of gated clusters is based on computing the difference between their density distribution functions' set points. The CPS was applied for the discrimination between two observations in a four-dimensional parameter space. The similarity coefficient (r) ranges between 0 (perfect similarity) to 1 (dissimilar). Three CPS validation tests were carried out: on the same stock samples of fluorescent beads, yielding very low r's (0, 0.066); and on two cell models: mitogenic stimulation of peripheral blood mononuclear cells (PBMC), and apoptosis induction in Jurkat T cell line by H2O2. In both latter cases, r indicated similarity (r < 0.23) within the same group, and dissimilarity (r > 0.48) otherwise. This classification and algorithm approach offers a measure of similarity between samples. It relies on the multidimensional pattern of the sample parameters. The algorithm compensates for environmental drifts in this apparatus and assay; it also may be applied to more than four dimensions.
High-dimensional cluster analysis with the Masked EM Algorithm

PubMed Central

Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

2014-01-01

Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Algorithm Calculates Cumulative Poisson Distribution

NASA Technical Reports Server (NTRS)

Bowerman, Paul N.; Nolty, Robert C.; Scheuer, Ernest M.

1992-01-01

Algorithm calculates accurate values of cumulative Poisson distribution under conditions where other algorithms fail because numbers are so small (underflow) or so large (overflow) that computer cannot process them. Factors inserted temporarily to prevent underflow and overflow. Implemented in CUMPOIS computer program described in "Cumulative Poisson Distribution Program" (NPO-17714).
Block clustering based on difference of convex functions (DC) programming and DC algorithms.

PubMed

Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai

2013-10-01

We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.

PubMed

Vimalarani, C; Subramanian, R; Sivanandam, S N

2016-01-01

Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

PubMed

Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun

2017-03-23

The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the
Orientation domains: A mobile grid clustering algorithm with spherical corrections

NASA Astrophysics Data System (ADS)

Mencos, Joana; Gratacós, Oscar; Farré, Mercè; Escalante, Joan; Arbués, Pau; Muñoz, Josep Anton

2012-12-01

An algorithm has been designed and tested which was devised as a tool assisting the analysis of geological structures solely from orientation data. More specifically, the algorithm was intended for the analysis of geological structures that can be approached as planar and piecewise features, like many folded strata. Input orientation data is expressed as pairs of angles (azimuth and dip). The algorithm starts by considering the data in Cartesian coordinates. This is followed by a search for an initial clustering solution, which is achieved by comparing the results output from the systematic shift of a regular rigid grid over the data. This initial solution is optimal (achieves minimum square error) once the grid size and the shift increment are fixed. Finally, the algorithm corrects for the variable spread that is generally expected from the data type using a reshaped non-rigid grid. The algorithm is size-oriented, which implies the application of conditions over cluster size through all the process in contrast to density-oriented algorithms, also widely used when dealing with spatial data. Results are derived in few seconds and, when tested over synthetic examples, they were found to be consistent and reliable. This makes the algorithm a valuable alternative to the time-consuming traditional approaches available to geologists.
Tsallis p⊥ distribution from statistical clusters

NASA Astrophysics Data System (ADS)

Bialas, A.

2015-07-01

It is shown that the transverse momentum distributions of particles emerging from the decay of statistical clusters, distributed according to a power law in their transverse energy, closely resemble those following from the Tsallis non-extensive statistical model. The experimental data are well reproduced with the cluster temperature T ≈ 160 MeV.
Privacy Preservation in Distributed Subgradient Optimization Algorithms.

PubMed

Lou, Youcheng; Yu, Lean; Wang, Shouyang; Yi, Peng

2017-07-31

In this paper, some privacy-preserving features for distributed subgradient optimization algorithms are considered. Most of the existing distributed algorithms focus mainly on the algorithm design and convergence analysis, but not the protection of agents' privacy. Privacy is becoming an increasingly important issue in applications involving sensitive information. In this paper, we first show that the distributed subgradient synchronous homogeneous-stepsize algorithm is not privacy preserving in the sense that the malicious agent can asymptotically discover other agents' subgradients by transmitting untrue estimates to its neighbors. Then a distributed subgradient asynchronous heterogeneous-stepsize projection algorithm is proposed and accordingly its convergence and optimality is established. In contrast to the synchronous homogeneous-stepsize algorithm, in the new algorithm agents make their optimization updates asynchronously with heterogeneous stepsizes. The introduced two mechanisms of projection operation and asynchronous heterogeneous-stepsize optimization can guarantee that agents' privacy can be effectively protected.
The CLASSY clustering algorithm: Description, evaluation, and comparison with the iterative self-organizing clustering system (ISOCLS). [used for LACIE data

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Malek, H.

1978-01-01

A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
A clustering algorithm for sample data based on environmental pollution characteristics

NASA Astrophysics Data System (ADS)

Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

2015-04-01

Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

PubMed Central

Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

2012-01-01

Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
A dynamic scheduling algorithm for singe-arm two-cluster tools with flexible processing times

NASA Astrophysics Data System (ADS)

Li, Xin; Fung, Richard Y. K.

2018-02-01

This article presents a dynamic algorithm for job scheduling in two-cluster tools producing multi-type wafers with flexible processing times. Flexible processing times mean that the actual times for processing wafers should be within given time intervals. The objective of the work is to minimize the completion time of the newly inserted wafer. To deal with this issue, a two-cluster tool is decomposed into three reduced single-cluster tools (RCTs) in a series based on a decomposition approach proposed in this article. For each single-cluster tool, a dynamic scheduling algorithm based on temporal constraints is developed to schedule the newly inserted wafer. Three experiments have been carried out to test the dynamic scheduling algorithm proposed, comparing with the results the 'earliest starting time' heuristic (EST) adopted in previous literature. The results show that the dynamic algorithm proposed in this article is effective and practical.
Spatial cluster detection using dynamic programming.

PubMed

Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

2012-03-25

The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for
Spatial cluster detection using dynamic programming

PubMed Central

2012-01-01

Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on
A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm.

PubMed

Ju, Chunhua; Xu, Chonghuan

2013-01-01

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.
A New Collaborative Recommendation Approach Based on Users Clustering Using Artificial Bee Colony Algorithm

PubMed Central

Ju, Chunhua

2013-01-01

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization

NASA Astrophysics Data System (ADS)

Liu, Zexi

2018-01-01

Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.
Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

PubMed Central

Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

2015-01-01

Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Cluster dynamics and cluster size distributions in systems of self-propelled particles

NASA Astrophysics Data System (ADS)

Peruani, F.; Schimansky-Geier, L.; Bär, M.

2010-12-01

Systems of self-propelled particles (SPP) interacting by a velocity alignment mechanism in the presence of noise exhibit rich clustering dynamics. Often, clusters are responsible for the distribution of (local) information in these systems. Here, we investigate the properties of individual clusters in SPP systems, in particular the asymmetric spreading behavior of clusters with respect to their direction of motion. In addition, we formulate a Smoluchowski-type kinetic model to describe the evolution of the cluster size distribution (CSD). This model predicts the emergence of steady-state CSDs in SPP systems. We test our theoretical predictions in simulations of SPP with nematic interactions and find that our simple kinetic model reproduces qualitatively the transition to aggregation observed in simulations.

An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

PubMed

Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

2017-12-01

Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
An effective trust-based recommendation method using a novel graph clustering algorithm

NASA Astrophysics Data System (ADS)

Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

2015-10-01

Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
An improved K-means clustering algorithm in agricultural image segmentation

NASA Astrophysics Data System (ADS)

Cheng, Huifeng; Peng, Hui; Liu, Shanmei

Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Global survey of star clusters in the Milky Way. VI. Age distribution and cluster formation history

NASA Astrophysics Data System (ADS)

Piskunov, A. E.; Just, A.; Kharchenko, N. V.; Berczik, P.; Scholz, R.-D.; Reffert, S.; Yen, S. X.

2018-06-01

Context. The all-sky Milky Way Star Clusters (MWSC) survey provides uniform and precise ages, along with other relevant parameters, for a wide variety of clusters in the extended solar neighbourhood. Aims: In this study we aim to construct the cluster age distribution, investigate its spatial variations, and discuss constraints on cluster formation scenarios of the Galactic disk during the last 5 Gyrs. Methods: Due to the spatial extent of the MWSC, we have considered spatial variations of the age distribution along galactocentric radius RG, and along Z-axis. For the analysis of the age distribution we used 2242 clusters, which all lie within roughly 2.5 kpc of the Sun. To connect the observed age distribution to the cluster formation history we built an analytical model based on simple assumptions on the cluster initial mass function and on the cluster mass-lifetime relation, fit it to the observations, and determined the parameters of the cluster formation law. Results: Comparison with the literature shows that earlier results strongly underestimated the number of evolved clusters with ages t ≳ 100 Myr. Recent studies based on all-sky catalogues agree better with our data, but still lack the oldest clusters with ages t ≳ 1 Gyr. We do not observe a strong variation in the age distribution along RG, though we find an enhanced fraction of older clusters (t > 1 Gyr) in the inner disk. In contrast, the distribution strongly varies along Z. The high altitude distribution practically does not contain clusters with t < 1 Gyr. With simple assumptions on the cluster formation history, the cluster initial mass function and the cluster lifetime we can reproduce the observations. The cluster formation rate and the cluster lifetime are strongly degenerate, which does not allow us to disentangle different formation scenarios. In all cases the cluster formation rate is strongly declining with time, and the cluster initial mass function is very shallow at the high mass end.
The cascaded moving k-means and fuzzy c-means clustering algorithms for unsupervised segmentation of malaria images

NASA Astrophysics Data System (ADS)

Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida

2015-05-01

Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Improved Quantum Artificial Fish Algorithm Application to Distributed Network Considering Distributed Generation.

PubMed

Du, Tingsong; Hu, Yang; Ke, Xianting

2015-01-01

An improved quantum artificial fish swarm algorithm (IQAFSA) for solving distributed network programming considering distributed generation is proposed in this work. The IQAFSA based on quantum computing which has exponential acceleration for heuristic algorithm uses quantum bits to code artificial fish and quantum revolving gate, preying behavior, and following behavior and variation of quantum artificial fish to update the artificial fish for searching for optimal value. Then, we apply the proposed new algorithm, the quantum artificial fish swarm algorithm (QAFSA), the basic artificial fish swarm algorithm (BAFSA), and the global edition artificial fish swarm algorithm (GAFSA) to the simulation experiments for some typical test functions, respectively. The simulation results demonstrate that the proposed algorithm can escape from the local extremum effectively and has higher convergence speed and better accuracy. Finally, applying IQAFSA to distributed network problems and the simulation results for 33-bus radial distribution network system show that IQAFSA can get the minimum power loss after comparing with BAFSA, GAFSA, and QAFSA.
Improved Quantum Artificial Fish Algorithm Application to Distributed Network Considering Distributed Generation

PubMed Central

Hu, Yang; Ke, Xianting

2015-01-01

An improved quantum artificial fish swarm algorithm (IQAFSA) for solving distributed network programming considering distributed generation is proposed in this work. The IQAFSA based on quantum computing which has exponential acceleration for heuristic algorithm uses quantum bits to code artificial fish and quantum revolving gate, preying behavior, and following behavior and variation of quantum artificial fish to update the artificial fish for searching for optimal value. Then, we apply the proposed new algorithm, the quantum artificial fish swarm algorithm (QAFSA), the basic artificial fish swarm algorithm (BAFSA), and the global edition artificial fish swarm algorithm (GAFSA) to the simulation experiments for some typical test functions, respectively. The simulation results demonstrate that the proposed algorithm can escape from the local extremum effectively and has higher convergence speed and better accuracy. Finally, applying IQAFSA to distributed network problems and the simulation results for 33-bus radial distribution network system show that IQAFSA can get the minimum power loss after comparing with BAFSA, GAFSA, and QAFSA. PMID:26447713
An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks

PubMed Central

Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal

2015-01-01

It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks.

PubMed

Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal

2015-08-13

It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

PubMed

Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

2008-07-01

UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
Swarm Intelligence in Text Document Clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cui, Xiaohui; Potok, Thomas E

2008-01-01

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Performance Assessment of the Optical Transient Detector and Lightning Imaging Sensor. Part 2; Clustering Algorithm

NASA Technical Reports Server (NTRS)

Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William

2006-01-01

We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

NASA Astrophysics Data System (ADS)

Wang, Audrey; Price, David T.

2007-03-01

A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny

NASA Astrophysics Data System (ADS)

Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila

2010-10-01

Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
A comparison of queueing, cluster and distributed computing systems

NASA Technical Reports Server (NTRS)

Kaplan, Joseph A.; Nelson, Michael L.

1993-01-01

Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
LYDIAN: An Extensible Educational Animation Environment for Distributed Algorithms

ERIC Educational Resources Information Center

Koldehofe, Boris; Papatriantafilou, Marina; Tsigas, Philippas

2006-01-01

LYDIAN is an environment to support the teaching and learning of distributed algorithms. It provides a collection of distributed algorithms as well as continuous animations. Users can combine algorithms and animations with arbitrary network structures defining the interconnection and behavior of the distributed algorithm. Further, it facilitates…
Scalable clustering algorithms for continuous environmental flow cytometry.

PubMed

Hyrkas, Jeremy; Clayton, Sophie; Ribalet, Francois; Halperin, Daniel; Armbrust, E Virginia; Howe, Bill

2016-02-01

Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. hyrkas@cs.washington.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A hybrid algorithm for clustering of time series data based on affinity search technique.

PubMed

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

PubMed Central

Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

2008-01-01

Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Two generalizations of Kohonen clustering

NASA Technical Reports Server (NTRS)

Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

1993-01-01

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.

Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

PubMed

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

2011-01-01

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
First Results on the Cluster Galaxy Population from the Subaru Hyper Suprime-Cam Survey. III. Brightest Cluster Galaxies, Stellar Mass Distribution, and Active Galaxies

NASA Astrophysics Data System (ADS)

Lin, Yen-Ting; Hsieh, Bau-Ching; Lin, Sheng-Chieh; Oguri, Masamune; Chen, Kai-Feng; Tanaka, Masayuki; Chiu, I.-Non; Huang, Song; Kodama, Tadayuki; Leauthaud, Alexie; More, Surhud; Nishizawa, Atsushi J.; Bundy, Kevin; Lin, Lihwai; Miyazaki, Satoshi

2017-12-01

The unprecedented depth and area surveyed by the Subaru Strategic Program with the Hyper Suprime-Cam (HSC-SSP) have enabled us to construct and publish the largest distant cluster sample out to z∼ 1 to date. In this exploratory study of cluster galaxy evolution from z = 1 to z = 0.3, we investigate the stellar mass assembly history of brightest cluster galaxies (BCGs), the evolution of stellar mass and luminosity distributions, the stellar mass surface density profile, as well as the population of radio galaxies. Our analysis is the first high-redshift application of the top N richest cluster selection, which is shown to allow us to trace the cluster galaxy evolution faithfully. Over the 230 deg2 area of the current HSC-SSP footprint, selecting the top 100 clusters in each of the four redshift bins allows us to observe the buildup of galaxy population in descendants of clusters whose z≈ 1 mass is about 2× {10}14 {M}ȯ . Our stellar mass is derived from a machine-learning algorithm, which is found to be unbiased and accurate with respect to the COSMOS data. We find very mild stellar mass growth in BCGs (about 35% between z = 1 and 0.3), and no evidence for evolution in both the total stellar mass–cluster mass correlation and the shape of the stellar mass surface density profile. We also present the first measurement of the radio luminosity distribution in clusters out to z∼ 1, and show hints of changes in the dominant accretion mode powering the cluster radio galaxies at z∼ 0.8.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

PubMed Central

Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

2014-01-01

Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
A cluster analysis on road traffic accidents using genetic algorithms

NASA Astrophysics Data System (ADS)

Saharan, Sabariah; Baragona, Roberto

2017-04-01

The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.
Distributed controller clustering in software defined networks.

PubMed

Abdelaziz, Ahmed; Fong, Ang Tan; Gani, Abdullah; Garba, Usman; Khan, Suleman; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond

2017-01-01

Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability.
Distributed controller clustering in software defined networks

PubMed Central

Gani, Abdullah; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond

2017-01-01

Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability. PMID:28384312
Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

PubMed

Gaur, Pallavi; Chaturvedi, Anoop

2017-07-22

The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

NASA Astrophysics Data System (ADS)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm

PubMed Central

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

PubMed

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

PubMed Central

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Assessment of economic status in trauma registries: A new algorithm for generating population-specific clustering-based models of economic status for time-constrained low-resource settings.

PubMed

Eyler, Lauren; Hubbard, Alan; Juillard, Catherine

2016-10-01

Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm

NASA Astrophysics Data System (ADS)

Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane

2017-08-01

Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.
Approximation algorithm for the problem of partitioning a sequence into clusters

NASA Astrophysics Data System (ADS)

Kel'manov, A. V.; Mikhailova, L. V.; Khamidullin, S. A.; Khandeev, V. I.

2017-08-01

We consider the problem of partitioning a finite sequence of Euclidean points into a given number of clusters (subsequences) using the criterion of the minimal sum (over all clusters) of intercluster sums of squared distances from the elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is at the origin, while the center of each of the other clusters is unknown and determined as the mean value over all elements in this cluster. Additionally, the partition obeys two structural constraints on the indices of sequence elements contained in the clusters with unknown centers: (1) the concatenation of the indices of elements in these clusters is an increasing sequence, and (2) the difference between an index and the preceding one is bounded above and below by prescribed constants. It is shown that this problem is strongly NP-hard. A 2-approximation algorithm is constructed that is polynomial-time for a fixed number of clusters.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network

PubMed Central

Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue

2016-01-01

Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.

PubMed

Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue

2016-02-19

Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.

PubMed

Baskaran, Madhusudhanan; Sadagopan, Chitra

2015-01-01

Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN

PubMed Central

Baskaran, Madhusudhanan; Sadagopan, Chitra

2015-01-01

Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Stokes space modulation format classification based on non-iterative clustering algorithm for coherent optical receivers.

PubMed

Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui

2017-02-06

A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.

Abelian non-global logarithms from soft gluon clustering

NASA Astrophysics Data System (ADS)

Kelley, Randall; Walsh, Jonathan R.; Zuberi, Saba

2012-09-01

Most recombination-style jet algorithms cluster soft gluons in a complex way. This leads to previously identified correlations in the soft gluon phase space and introduces logarithmic corrections to jet cross sections, which are known as clustering logarithms. The leading Abelian clustering logarithms occur at least at next-to leading logarithm (NLL) in the exponent of the distribution. Using the framework of Soft Collinear Effective Theory (SCET), we show that new clustering effects contributing at NLL arise at each order. While numerical resummation of clustering logs is possible, it is unlikely that they can be analytically resummed to NLL. Clustering logarithms make the anti-kT algorithm theoretically preferred, for which they are power suppressed. They can arise in Abelian and non-Abelian terms, and we calculate the Abelian clustering logarithms at O ( {α_s^2} ) for the jet mass distribution using the Cambridge/Aachen and kT algorithms, including jet radius dependence, which extends previous results. We find that clustering logarithms can be naturally thought of as a class of non-global logarithms, which have traditionally been tied to non-Abelian correlations in soft gluon emission.
A Network-Based Algorithm for Clustering Multivariate Repeated Measures Data

NASA Technical Reports Server (NTRS)

Koslovsky, Matthew; Arellano, John; Schaefer, Caroline; Feiveson, Alan; Young, Millennia; Lee, Stuart

2017-01-01

The National Aeronautics and Space Administration (NASA) Astronaut Corps is a unique occupational cohort for which vast amounts of measures data have been collected repeatedly in research or operational studies pre-, in-, and post-flight, as well as during multiple clinical care visits. In exploratory analyses aimed at generating hypotheses regarding physiological changes associated with spaceflight exposure, such as impaired vision, it is of interest to identify anomalies and trends across these expansive datasets. Multivariate clustering algorithms for repeated measures data may help parse the data to identify homogeneous groups of astronauts that have higher risks for a particular physiological change. However, available clustering methods may not be able to accommodate the complex data structures found in NASA data, since the methods often rely on strict model assumptions, require equally-spaced and balanced assessment times, cannot accommodate missing data or differing time scales across variables, and cannot process continuous and discrete data simultaneously. To fill this gap, we propose a network-based, multivariate clustering algorithm for repeated measures data that can be tailored to fit various research settings. Using simulated data, we demonstrate how our method can be used to identify patterns in complex data structures found in practice.
ClusterControl: a web interface for distributing and monitoring bioinformatics applications on a Linux cluster.

PubMed

Stocker, Gernot; Rieder, Dietmar; Trajanoski, Zlatko

2004-03-22

ClusterControl is a web interface to simplify distributing and monitoring bioinformatics applications on Linux cluster systems. We have developed a modular concept that enables integration of command line oriented program into the application framework of ClusterControl. The systems facilitate integration of different applications accessed through one interface and executed on a distributed cluster system. The package is based on freely available technologies like Apache as web server, PHP as server-side scripting language and OpenPBS as queuing system and is available free of charge for academic and non-profit institutions. http://genome.tugraz.at/Software/ClusterControl
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

NASA Astrophysics Data System (ADS)

Alfarizy, A. D.; Indahwati; Sartono, B.

2017-03-01

Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.
Clustering Algorithms: Their Application to Gene Expression Data

PubMed Central

Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

2016-01-01

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Text grouping in patent analysis using adaptive K-means clustering algorithm

NASA Astrophysics Data System (ADS)

Shanie, Tiara; Suprijadi, Jadi; Zulhanif

2017-03-01

Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.
Using clustering and a modified classification algorithm for automatic text summarization

NASA Astrophysics Data System (ADS)

Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

2013-01-01

In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.
Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

NASA Astrophysics Data System (ADS)

Ma, Xiaoke; Wang, Bingbo; Yu, Liang

2018-01-01

Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.
Reexamining cluster radioactivity in trans-lead nuclei with consideration of specific density distributions in daughter nuclei and clusters

NASA Astrophysics Data System (ADS)

Qian, Yibin; Ren, Zhongzhou; Ni, Dongdong

2016-08-01

We further investigate the cluster emission from heavy nuclei beyond the lead region in the framework of the preformed cluster model. The refined cluster-core potential is constructed by the double-folding integral of the density distributions of the daughter nucleus and the emitted cluster, where the radius or the diffuseness parameter in the Fermi density distribution formula is determined according to the available experimental data on the charge radii and the neutron skin thickness. The Schrödinger equation of the cluster-daughter relative motion is then solved within the outgoing Coulomb wave-function boundary conditions to obtain the decay width. It is found that the present decay width of cluster emitters is clearly enhanced as compared to that in the previous case, which involved the fixed parametrization for the density distributions of daughter nuclei and clusters. Among the whole procedure, the nuclear deformation of clusters is also introduced into the calculations, and the degree of its influence on the final decay half-life is checked to some extent. Moreover, the effect from the bubble density distribution of clusters on the final decay width is carefully discussed by using the central depressed distribution.
Open source clustering software.

PubMed

de Hoon, M J L; Imoto, S; Nolan, J; Miyano, S

2004-06-12

We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.
The Size Distribution Of Cluster Galaxies

NASA Astrophysics Data System (ADS)

Kuchner, U.; Ziegler, B.; Bamford, S.; Verdugo, M.; Haeussler, B.

2017-06-01

We establish a sample of 560 spectroscopically confirmed cluster members of MACS J1206.2- 0847 at z = 0.45 and utilize multi-wavelength and multi-component Sersic profile fitting to provide luminosities and sizes for the key structural components bulge and disk. While the difference between field and cluster galaxy properties are mostly due to a preference for cluster members to be early-type (quiescent, bulge-dominated), we see evidence for an outer disk fading and a sharp rise in the number of red disks with smaller effective radii at the tidally active cluster region around R200. Even though red disks are already virialized according to their velocity distribution, they are clearly not part of the old population found in the innermost region; they represent an important population of transitional objects in clusters.
Cluster analysis for determining distribution center location

NASA Astrophysics Data System (ADS)

Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

2017-12-01

Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.
A Parametric k-Means Algorithm

PubMed Central

Tarpey, Thaddeus

2007-01-01

Summary The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. PMID:17917692
Multi-source feature extraction and target recognition in wireless sensor networks based on adaptive distributed wavelet compression algorithms

NASA Astrophysics Data System (ADS)

Hortos, William S.

2008-04-01

Proposed distributed wavelet-based algorithms are a means to compress sensor data received at the nodes forming a wireless sensor network (WSN) by exchanging information between neighboring sensor nodes. Local collaboration among nodes compacts the measurements, yielding a reduced fused set with equivalent information at far fewer nodes. Nodes may be equipped with multiple sensor types, each capable of sensing distinct phenomena: thermal, humidity, chemical, voltage, or image signals with low or no frequency content as well as audio, seismic or video signals within defined frequency ranges. Compression of the multi-source data through wavelet-based methods, distributed at active nodes, reduces downstream processing and storage requirements along the paths to sink nodes; it also enables noise suppression and more energy-efficient query routing within the WSN. Targets are first detected by the multiple sensors; then wavelet compression and data fusion are applied to the target returns, followed by feature extraction from the reduced data; feature data are input to target recognition/classification routines; targets are tracked during their sojourns through the area monitored by the WSN. Algorithms to perform these tasks are implemented in a distributed manner, based on a partition of the WSN into clusters of nodes. In this work, a scheme of collaborative processing is applied for hierarchical data aggregation and decorrelation, based on the sensor data itself and any redundant information, enabled by a distributed, in-cluster wavelet transform with lifting that allows multiple levels of resolution. The wavelet-based compression algorithm significantly decreases RF bandwidth and other resource use in target processing tasks. Following wavelet compression, features are extracted. The objective of feature extraction is to maximize the probabilities of correct target classification based on multi-source sensor measurements, while minimizing the resource expenditures at
Optimizing Energy Consumption in Vehicular Sensor Networks by Clustering Using Fuzzy C-Means and Fuzzy Subtractive Algorithms

NASA Astrophysics Data System (ADS)

Ebrahimi, A.; Pahlavani, P.; Masoumi, Z.

2017-09-01

Traffic monitoring and managing in urban intelligent transportation systems (ITS) can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC) for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs); moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH), and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM) and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.
[Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

PubMed

Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

2016-10-01

Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.
Mammographic images segmentation based on chaotic map clustering algorithm

PubMed Central

2014-01-01

Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
Data Clustering

NASA Astrophysics Data System (ADS)

Wagstaff, Kiri L.

2012-03-01

clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a
Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

NASA Astrophysics Data System (ADS)

Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

NASA Astrophysics Data System (ADS)

Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

2018-04-01

Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Finding reproducible cluster partitions for the k-means algorithm

PubMed Central

2013-01-01

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085
Finding reproducible cluster partitions for the k-means algorithm.

PubMed

Lisboa, Paulo J G; Etchells, Terence A; Jarman, Ian H; Chambers, Simon J

2013-01-01

K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.
K-mean clustering algorithm for processing signals from compound semiconductor detectors

NASA Astrophysics Data System (ADS)

Tada, Tsutomu; Hitomi, Keitaro; Wu, Yan; Kim, Seong-Yun; Yamazaki, Hiromichi; Ishii, Keizo

2011-12-01

The K-mean clustering algorithm was employed for processing signal waveforms from TlBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The classified signal waveforms were processed individually to suppress the pulse height variation of signals due to the charge collection loss. The obtained energy resolution of a 137Cs spectrum measured with a 0.5 mm thick TlBr detector was 1.3% FWHM by employing 500 clusters.
Adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization

NASA Astrophysics Data System (ADS)

Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo

2018-04-01

Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.
Distributed query plan generation using multiobjective genetic algorithm.

PubMed

Panicker, Shina; Kumar, T V Vijay

2014-01-01

A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.
Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

PubMed Central

Panicker, Shina; Vijay Kumar, T. V.

2014-01-01

A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513
Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

NASA Astrophysics Data System (ADS)

Chang, Bingguo; Chen, Xiaofei

2018-05-01

Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.
The distribution of saturated clusters in wetted granular materials

NASA Astrophysics Data System (ADS)

Li, Shuoqi; Hanaor, Dorian; Gan, Yixiang

2017-06-01

The hydro-mechanical behaviour of partially saturated granular materials is greatly influenced by the spatial and temporal distribution of liquid within the media. The aim of this paper is to characterise the distribution of saturated clusters in granular materials using an optical imaging method under different water drainage conditions. A saturated cluster is formed when a liquid phase fully occupies the pore space between solid grains in a localized region. The samples considered here were prepared by vibrating mono-sized glass beads to form closely packed assemblies in a rectangular container. A range of drainage conditions were applied to the specimen by tilting the container and employing different flow rates, and the liquid pressure was recorded at different positions in the experimental cell. The formation of saturated clusters during the liquid withdrawal processes is governed by three competing mechanisms arising from viscous, capillary, and gravitational forces. When the flow rate is sufficiently large and the gravity component is sufficiently small, the viscous force tends to destabilize the liquid front leading to the formation of narrow fingers of saturated material. As the water channels along these liquid fingers break, saturated clusters are formed inside the specimen. Subsequently, a spatial and temporal distribution of saturated clusters can be observed. We investigated the resulting saturated cluster distribution as a function of flow rate and gravity to achieve a fundamental understanding of the formation and evolution of such clusters in partially saturated granular materials. This study serves as a bridge between pore-scale behavior and the overall hydro-mechanical characteristics in partially saturated soils.
Ternary alloy material prediction using genetic algorithm and cluster expansion

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chong

2015-12-01

This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we didmore » our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe 3VSi 2 is a new stable phase and it can be very inspiring to the future experiments.« less
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

PubMed

Bhattacharya, Anindya; De, Rajat K

2010-08-01

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to
A comparative study of DIGNET, average, complete, single hierarchical and k-means clustering algorithms in 2D face image recognition

NASA Astrophysics Data System (ADS)

Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.

2014-06-01

The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

PubMed

Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

2018-06-01

Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.
Dynamical Mass Measurements of Contaminated Galaxy Clusters Using Machine Learning

NASA Astrophysics Data System (ADS)

Ntampaka, M.; Trac, H.; Sutherland, D. J.; Fromenteau, S.; Póczos, B.; Schneider, J.

2016-11-01

We study dynamical mass measurements of galaxy clusters contaminated by interlopers and show that a modern machine learning algorithm can predict masses by better than a factor of two compared to a standard scaling relation approach. We create two mock catalogs from Multidark’s publicly available N-body MDPL1 simulation, one with perfect galaxy cluster membership information and the other where a simple cylindrical cut around the cluster center allows interlopers to contaminate the clusters. In the standard approach, we use a power-law scaling relation to infer cluster mass from galaxy line-of-sight (LOS) velocity dispersion. Assuming perfect membership knowledge, this unrealistic case produces a wide fractional mass error distribution, with a width of {{Δ }}ε ≈ 0.87. Interlopers introduce additional scatter, significantly widening the error distribution further ({{Δ }}ε ≈ 2.13). We employ the support distribution machine (SDM) class of algorithms to learn from distributions of data to predict single values. Applied to distributions of galaxy observables such as LOS velocity and projected distance from the cluster center, SDM yields better than a factor-of-two improvement ({{Δ }}ε ≈ 0.67) for the contaminated case. Remarkably, SDM applied to contaminated clusters is better able to recover masses than even the scaling relation approach applied to uncontaminated clusters. We show that the SDM method more accurately reproduces the cluster mass function, making it a valuable tool for employing cluster observations to evaluate cosmological models.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm

NASA Technical Reports Server (NTRS)

Mitra, Sunanda; Pemmaraju, Surya

1992-01-01

Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Bezdek, James C.

1992-01-01

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images

DOE PAGES

Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; ...

2015-10-26

Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less
Performance Enhancement of Radial Distributed System with Distributed Generators by Reconfiguration Using Binary Firefly Algorithm

NASA Astrophysics Data System (ADS)

Rajalakshmi, N.; Padma Subramanian, D.; Thamizhavel, K.

2015-03-01

The extent of real power loss and voltage deviation associated with overloaded feeders in radial distribution system can be reduced by reconfiguration. Reconfiguration is normally achieved by changing the open/closed state of tie/sectionalizing switches. Finding optimal switch combination is a complicated problem as there are many switching combinations possible in a distribution system. Hence optimization techniques are finding greater importance in reducing the complexity of reconfiguration problem. This paper presents the application of firefly algorithm (FA) for optimal reconfiguration of radial distribution system with distributed generators (DG). The algorithm is tested on IEEE 33 bus system installed with DGs and the results are compared with binary genetic algorithm. It is found that binary FA is more effective than binary genetic algorithm in achieving real power loss reduction and improving voltage profile and hence enhancing the performance of radial distribution system. Results are found to be optimum when DGs are added to the test system, which proved the impact of DGs on distribution system.
SU-G-TeP3-14: Three-Dimensional Cluster Model in Inhomogeneous Dose Distribution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, J; Penagaricano, J; Narayanasamy, G

2016-06-15

Purpose: We aim to investigate 3D cluster formation in inhomogeneous dose distribution to search for new models predicting radiation tissue damage and further leading to new optimization paradigm for radiotherapy planning. Methods: The aggregation of higher dose in the organ at risk (OAR) than a preset threshold was chosen as the cluster whose connectivity dictates the cluster structure. Upon the selection of the dose threshold, the fractional density defined as the fraction of voxels in the organ eligible to be part of the cluster was determined according to the dose volume histogram (DVH). A Monte Carlo method was implemented tomore » establish a case pertinent to the corresponding DVH. Ones and zeros were randomly assigned to each OAR voxel with the sampling probability equal to the fractional density. Ten thousand samples were randomly generated to ensure a sufficient number of cluster sets. A recursive cluster searching algorithm was developed to analyze the cluster with various connectivity choices like 1-, 2-, and 3-connectivity. The mean size of the largest cluster (MSLC) from the Monte Carlo samples was taken to be a function of the fractional density. Various OARs from clinical plans were included in the study. Results: Intensive Monte Carlo study demonstrates the inverse relationship between the MSLC and the cluster connectivity as anticipated and the cluster size does not change with fractional density linearly regardless of the connectivity types. An initially-slow-increase to exponential growth transition of the MSLC from low to high density was observed. The cluster sizes were found to vary within a large range and are relatively independent of the OARs. Conclusion: The Monte Carlo study revealed that the cluster size could serve as a suitable index of the tissue damage (percolation cluster) and the clinical outcome of the same DVH might be potentially different.« less
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

PubMed

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-05-21

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

PubMed Central

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-01-01

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
A robust fuzzy local Information c-means clustering algorithm with noise detection

NASA Astrophysics Data System (ADS)

Shang, Jiayu; Li, Shiren; Huang, Junwei

2018-04-01

Fuzzy c-means clustering (FCM), especially with spatial constraints (FCM_S), is an effective algorithm suitable for image segmentation. Its reliability contributes not only to the presentation of fuzziness for belongingness of every pixel but also to exploitation of spatial contextual information. But these algorithms still remain some problems when processing the image with noise, they are sensitive to the parameters which have to be tuned according to prior knowledge of the noise. In this paper, we propose a new FCM algorithm, combining the gray constraints and spatial constraints, called spatial and gray-level denoised fuzzy c-means (SGDFCM) algorithm. This new algorithm conquers the parameter disadvantages mentioned above by considering the possibility of noise of each pixel, which aims to improve the robustness and obtain more detail information. Furthermore, the possibility of noise can be calculated in advance, which means the algorithm is effective and efficient.
A Degree Distribution Optimization Algorithm for Image Transmission

NASA Astrophysics Data System (ADS)

Jiang, Wei; Yang, Junjie

2016-09-01

Luby Transform (LT) code is the first practical implementation of digital fountain code. The coding behavior of LT code is mainly decided by the degree distribution which determines the relationship between source data and codewords. Two degree distributions are suggested by Luby. They work well in typical situations but not optimally in case of finite encoding symbols. In this work, the degree distribution optimization algorithm is proposed to explore the potential of LT code. Firstly selection scheme of sparse degrees for LT codes is introduced. Then probability distribution is optimized according to the selected degrees. In image transmission, bit stream is sensitive to the channel noise and even a single bit error may cause the loss of synchronization between the encoder and the decoder. Therefore the proposed algorithm is designed for image transmission situation. Moreover, optimal class partition is studied for image transmission with unequal error protection. The experimental results are quite promising. Compared with LT code with robust soliton distribution, the proposed algorithm improves the final quality of recovered images obviously with the same overhead.
An extended affinity propagation clustering method based on different data density types.

PubMed

Zhao, XiuLi; Xu, WeiXiang

2015-01-01

Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
A distributed algorithm for machine learning

NASA Astrophysics Data System (ADS)

Chen, Shihong

2018-04-01

This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.
Developing the fuzzy c-means clustering algorithm based on maximum entropy for multitarget tracking in a cluttered environment

NASA Astrophysics Data System (ADS)

Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing

2018-01-01

For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.

PubMed

Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm

PubMed Central

Baig, Fahd; Little, Max A.

2016-01-01

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525
Mass Distribution in Galaxy Cluster Cores

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogan, M. T.; McNamara, B. R.; Pulido, F.

Many processes within galaxy clusters, such as those believed to govern the onset of thermally unstable cooling and active galactic nucleus feedback, are dependent upon local dynamical timescales. However, accurate mapping of the mass distribution within individual clusters is challenging, particularly toward cluster centers where the total mass budget has substantial radially dependent contributions from the stellar ( M {sub *}), gas ( M {sub gas}), and dark matter ( M {sub DM}) components. In this paper we use a small sample of galaxy clusters with deep Chandra observations and good ancillary tracers of their gravitating mass at both largemore » and small radii to develop a method for determining mass profiles that span a wide radial range and extend down into the central galaxy. We also consider potential observational pitfalls in understanding cooling in hot cluster atmospheres, and find tentative evidence for a relationship between the radial extent of cooling X-ray gas and nebular H α emission in cool-core clusters. At large radii the entropy profiles of our clusters agree with the baseline power law of K ∝ r {sup 1.1} expected from gravity alone. At smaller radii our entropy profiles become shallower but continue with a power law of the form K ∝ r {sup 0.67} down to our resolution limit. Among this small sample of cool-core clusters we therefore find no support for the existence of a central flat “entropy floor.”.« less
Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system

NASA Astrophysics Data System (ADS)

Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

2015-01-01

This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
Tissue Probability Map Constrained 4-D Clustering Algorithm for Increased Accuracy and Robustness in Serial MR Brain Image Segmentation

PubMed Central

Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen

2010-01-01

The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
Adaptive Scaling of Cluster Boundaries for Large-Scale Social Media Data Clustering.

PubMed

Meng, Lei; Tan, Ah-Hwee; Wunsch, Donald C

2016-12-01

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the clustering mechanism of Fuzzy ART, and discover the vigilance region (VR) that essentially determines how a cluster in the Fuzzy ART system recognizes similar patterns in the feature space. The VR gives an intrinsic interpretation of the clustering mechanism and limitations of Fuzzy ART. Second, we introduce the idea of allowing different clusters in the Fuzzy ART system to have different vigilance levels in order to meet the diverse nature of the pattern distribution of social media data. To this end, we propose three vigilance adaptation methods, namely, the activation maximization (AM) rule, the confliction minimization (CM) rule, and the hybrid integration (HI) rule. With an initial vigilance value, the resulting clustering algorithms, namely, the AM-ART, CM-ART, and HI-ART, can automatically adapt the vigilance values of all clusters during the learning epochs in order to produce better cluster boundaries. Experiments on four social media data sets show that AM-ART, CM-ART, and HI-ART are more robust than Fuzzy ART to the initial vigilance value, and they usually achieve better or comparable performance and much faster speed than the state-of-the-art clustering algorithms that also do not require a predefined number of clusters.
Performance Analysis of Combined Methods of Genetic Algorithm and K-Means Clustering in Determining the Value of Centroid

NASA Astrophysics Data System (ADS)

Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna

2017-12-01

The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.
Clustering-based energy-saving algorithm in ultra-dense network

NASA Astrophysics Data System (ADS)

Huang, Junwei; Zhou, Pengguang; Teng, Deyang; Zhang, Renchi; Xu, Hao

2017-06-01

In Ultra-dense Networks (UDN), dense deployment of low power small base stations will cause serious small cells interference and a large amount of energy consumption. The purpose of this paper is to explore the method of reducing small cells interference and energy saving system in UDN, and we innovatively propose a sleep-waking-active (SWA) scheme. The scheme decreases the user outage causing by failure to detect users’ service requests, shortens the opening time of active base stations directly switching to sleep mode; we further proposes a Vertex Surrounding Clustering(VSC) algorithm, which first colours the small cells with the most strongest interference and next extends to the adjacent small cells. VSC algorithm can use the least colour to stain the small cell, reduce the number of iterations and promote the efficiency of colouring. The simulation results show that SWA scheme can effectively improve the system Energy Efficiency (EE), the VSC algorithm can reduce the small cells interference and optimize the users’ Spectrum Efficiency (SE) and throughput.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

PubMed

Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

2015-01-01

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Classification of posture maintenance data with fuzzy clustering algorithms

NASA Technical Reports Server (NTRS)

Bezdek, James C.

1991-01-01

Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
Detection and clustering of features in aerial images by neuron network-based algorithm

NASA Astrophysics Data System (ADS)

Vozenilek, Vit

2015-12-01

The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.
Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

PubMed

Balouchestani, Mohammadreza; Krishnan, Sridhar

2014-01-01

Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Adaptive density trajectory cluster based on time and space distance

NASA Astrophysics Data System (ADS)

Liu, Fagui; Zhang, Zhijie

2017-10-01

There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

PubMed Central

Liu, Wenfen

2017-01-01

Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

Moving target tracking through distributed clustering in directional sensor networks.

PubMed

Enayet, Asma; Razzaque, Md Abdur; Hassan, Mohammad Mehedi; Almogren, Ahmad; Alamri, Atif

2014-12-18

The problem of moving target tracking in directional sensor networks (DSNs) introduces new research challenges, including optimal selection of sensing and communication sectors of the directional sensor nodes, determination of the precise location of the target and an energy-efficient data collection mechanism. Existing solutions allow individual sensor nodes to detect the target's location through collaboration among neighboring nodes, where most of the sensors are activated and communicate with the sink. Therefore, they incur much overhead, loss of energy and reduced target tracking accuracy. In this paper, we have proposed a clustering algorithm, where distributed cluster heads coordinate their member nodes in optimizing the active sensing and communication directions of the nodes, precisely determining the target location by aggregating reported sensing data from multiple nodes and transferring the resultant location information to the sink. Thus, the proposed target tracking mechanism minimizes the sensing redundancy and maximizes the number of sleeping nodes in the network. We have also investigated the dynamic approach of activating sleeping nodes on-demand so that the moving target tracking accuracy can be enhanced while maximizing the network lifetime. We have carried out our extensive simulations in ns-3, and the results show that the proposed mechanism achieves higher performance compared to the state-of-the-art works.
Moving Target Tracking through Distributed Clustering in Directional Sensor Networks

PubMed Central

Enayet, Asma; Razzaque, Md. Abdur; Hassan, Mohammad Mehedi; Almogren, Ahmad; Alamri, Atif

2014-01-01

The problem of moving target tracking in directional sensor networks (DSNs) introduces new research challenges, including optimal selection of sensing and communication sectors of the directional sensor nodes, determination of the precise location of the target and an energy-efficient data collection mechanism. Existing solutions allow individual sensor nodes to detect the target's location through collaboration among neighboring nodes, where most of the sensors are activated and communicate with the sink. Therefore, they incur much overhead, loss of energy and reduced target tracking accuracy. In this paper, we have proposed a clustering algorithm, where distributed cluster heads coordinate their member nodes in optimizing the active sensing and communication directions of the nodes, precisely determining the target location by aggregating reported sensing data from multiple nodes and transferring the resultant location information to the sink. Thus, the proposed target tracking mechanism minimizes the sensing redundancy and maximizes the number of sleeping nodes in the network. We have also investigated the dynamic approach of activating sleeping nodes on-demand so that the moving target tracking accuracy can be enhanced while maximizing the network lifetime. We have carried out our extensive simulations in ns-3, and the results show that the proposed mechanism achieves higher performance compared to the state-of-the-art works. PMID:25529205
KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.

PubMed

Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C

2014-06-01

KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition.
Buried landmine detection using multivariate normal clustering

NASA Astrophysics Data System (ADS)

Duston, Brian M.

2001-10-01

A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.
Planning of distributed generation in distribution network based on improved particle swarm optimization algorithm

NASA Astrophysics Data System (ADS)

Li, Jinze; Qu, Zhi; He, Xiaoyang; Jin, Xiaoming; Li, Tie; Wang, Mingkai; Han, Qiu; Gao, Ziji; Jiang, Feng

2018-02-01

Large-scale access of distributed power can improve the current environmental pressure, at the same time, increasing the complexity and uncertainty of overall distribution system. Rational planning of distributed power can effectively improve the system voltage level. To this point, the specific impact on distribution network power quality caused by the access of typical distributed power was analyzed and from the point of improving the learning factor and the inertia weight, an improved particle swarm optimization algorithm (IPSO) was proposed which could solve distributed generation planning for distribution network to improve the local and global search performance of the algorithm. Results show that the proposed method can well reduce the system network loss and improve the economic performance of system operation with distributed generation.
An adaptive enhancement algorithm for infrared video based on modified k-means clustering

NASA Astrophysics Data System (ADS)

Zhang, Linze; Wang, Jingqi; Wu, Wen

2016-09-01

In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.
A Survey of Distributed Optimization and Control Algorithms for Electric Power Systems

DOE PAGES

Molzahn, Daniel K.; Dorfler, Florian K.; Sandberg, Henrik; ...

2017-07-25

Historically, centrally computed algorithms have been the primary means of power system optimization and control. With increasing penetrations of distributed energy resources requiring optimization and control of power systems with many controllable devices, distributed algorithms have been the subject of significant research interest. Here, this paper surveys the literature of distributed algorithms with applications to optimization and control of power systems. In particular, this paper reviews distributed algorithms for offline solution of optimal power flow (OPF) problems as well as online algorithms for real-time solution of OPF, optimal frequency control, optimal voltage control, and optimal wide-area control problems.
A Survey of Distributed Optimization and Control Algorithms for Electric Power Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Molzahn, Daniel K.; Dorfler, Florian K.; Sandberg, Henrik

Historically, centrally computed algorithms have been the primary means of power system optimization and control. With increasing penetrations of distributed energy resources requiring optimization and control of power systems with many controllable devices, distributed algorithms have been the subject of significant research interest. Here, this paper surveys the literature of distributed algorithms with applications to optimization and control of power systems. In particular, this paper reviews distributed algorithms for offline solution of optimal power flow (OPF) problems as well as online algorithms for real-time solution of OPF, optimal frequency control, optimal voltage control, and optimal wide-area control problems.
Distributed genetic algorithms for the floorplan design problem

NASA Technical Reports Server (NTRS)

Cohoon, James P.; Hegde, Shailesh U.; Martin, Worthy N.; Richards, Dana S.

1991-01-01

Designing a VLSI floorplan calls for arranging a given set of modules in the plane to minimize the weighted sum of area and wire-length measures. A method of solving the floorplan design problem using distributed genetic algorithms is presented. Distributed genetic algorithms, based on the paleontological theory of punctuated equilibria, offer a conceptual modification to the traditional genetic algorithms. Experimental results on several problem instances demonstrate the efficacy of this method and indicate the advantages of this method over other methods, such as simulated annealing. The method has performed better than the simulated annealing approach, both in terms of the average cost of the solutions found and the best-found solution, in almost all the problem instances tried.
An optimal autonomous microgrid cluster based on distributed generation droop parameter optimization and renewable energy sources using an improved grey wolf optimizer

NASA Astrophysics Data System (ADS)

Moazami Goodarzi, Hamed; Kazemi, Mohammad Hosein

2018-05-01

Microgrid (MG) clustering is regarded as an important driver in improving the robustness of MGs. However, little research has been conducted on providing appropriate MG clustering. This article addresses this shortfall. It proposes a novel multi-objective optimization approach for finding optimal clustering of autonomous MGs by focusing on variables such as distributed generation (DG) droop parameters, the location and capacity of DG units, renewable energy sources, capacitors and powerline transmission. Power losses are minimized and voltage stability is improved while virtual cut-set lines with minimum power transmission for clustering MGs are obtained. A novel chaotic grey wolf optimizer (CGWO) algorithm is applied to solve the proposed multi-objective problem. The performance of the approach is evaluated by utilizing a 69-bus MG in several scenarios.
Diametrical clustering for identifying anti-correlated gene clusters.

PubMed

Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

2003-09-01

Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Intelligent decision support algorithm for distribution system restoration.

PubMed

Singh, Reetu; Mehfuz, Shabana; Kumar, Parmod

2016-01-01

Distribution system is the means of revenue for electric utility. It needs to be restored at the earliest if any feeder or complete system is tripped out due to fault or any other cause. Further, uncertainty of the loads, result in variations in the distribution network's parameters. Thus, an intelligent algorithm incorporating hybrid fuzzy-grey relation, which can take into account the uncertainties and compare the sequences is discussed to analyse and restore the distribution system. The simulation studies are carried out to show the utility of the method by ranking the restoration plans for a typical distribution system. This algorithm also meets the smart grid requirements in terms of an automated restoration plan for the partial/full blackout of network.
Segmentation of dermatoscopic images by frequency domain filtering and k-means clustering algorithms.

PubMed

Rajab, Maher I

2011-11-01

Since the introduction of epiluminescence microscopy (ELM), image analysis tools have been extended to the field of dermatology, in an attempt to algorithmically reproduce clinical evaluation. Accurate image segmentation of skin lesions is one of the key steps for useful, early and non-invasive diagnosis of coetaneous melanomas. This paper proposes two image segmentation algorithms based on frequency domain processing and k-means clustering/fuzzy k-means clustering. The two methods are capable of segmenting and extracting the true border that reveals the global structure irregularity (indentations and protrusions), which may suggest excessive cell growth or regression of a melanoma. As a pre-processing step, Fourier low-pass filtering is applied to reduce the surrounding noise in a skin lesion image. A quantitative comparison of the techniques is enabled by the use of synthetic skin lesion images that model lesions covered with hair to which Gaussian noise is added. The proposed techniques are also compared with an established optimal-based thresholding skin-segmentation method. It is demonstrated that for lesions with a range of different border irregularity properties, the k-means clustering and fuzzy k-means clustering segmentation methods provide the best performance over a range of signal to noise ratios. The proposed segmentation techniques are also demonstrated to have similar performance when tested on real skin lesions representing high-resolution ELM images. This study suggests that the segmentation results obtained using a combination of low-pass frequency filtering and k-means or fuzzy k-means clustering are superior to the result that would be obtained by using k-means or fuzzy k-means clustering segmentation methods alone. © 2011 John Wiley & Sons A/S.
Iterative Track Fitting Using Cluster Classification in Multi Wire Proportional Chamber

NASA Astrophysics Data System (ADS)

Primor, David; Mikenberg, Giora; Etzion, Erez; Messer, Hagit

2007-10-01

This paper addresses the problem of track fitting of a charged particle in a multi wire proportional chamber (MWPC) using cathode readout strips. When a charged particle crosses a MWPC, a positive charge is induced on a cluster of adjacent strips. In the presence of high radiation background, the cluster charge measurements may be contaminated due to background particles, leading to less accurate hit position estimation. The least squares method for track fitting assumes the same position error distribution for all hits and thus loses its optimal properties on contaminated data. For this reason, a new robust algorithm is proposed. The algorithm first uses the known spatial charge distribution caused by a single charged particle over the strips, and classifies the clusters into ldquocleanrdquo and ldquodirtyrdquo clusters. Then, using the classification results, it performs an iterative weighted least squares fitting procedure, updating its optimal weights each iteration. The performance of the suggested algorithm is compared to other track fitting techniques using a simulation of tracks with radiation background. It is shown that the algorithm improves the track fitting performance significantly. A practical implementation of the algorithm is presented for muon track fitting in the cathode strip chamber (CSC) of the ATLAS experiment.
DCE: A Distributed Energy-Eﬃcient Clustering Protocol for Wireless Sensor Network Based on Double-Phase Cluster-Head Election.

PubMed

Han, Ruisong; Yang, Wei; Wang, Yipeng; You, Kaiming

2017-05-01

Clustering is an effective technique used to reduce energy consumption and extend the lifetime of wireless sensor network (WSN). The characteristic of energy heterogeneity of WSNs should be considered when designing clustering protocols. We propose and evaluate a novel distributed energy-eﬃcient clustering protocol called DCE for heterogeneous wireless sensor networks, based on a Double-phase Cluster-head Election scheme. In DCE, the procedure of cluster head election is divided into two phases. In the first phase, tentative cluster heads are elected with the probabilities which are decided by the relative levels of initial and residual energy. Then, in the second phase, the tentative cluster heads are replaced by their cluster members to form the final set of cluster heads if any member in their cluster has more residual energy. Employing two phases for cluster-head election ensures that the nodes with more energy have a higher chance to be cluster heads. Energy consumption is well-distributed in the proposed protocol, and the simulation results show that DCE achieves longer stability periods than other typical clustering protocols in heterogeneous scenarios.
QoE collaborative evaluation method based on fuzzy clustering heuristic algorithm.

PubMed

Bao, Ying; Lei, Weimin; Zhang, Wei; Zhan, Yuzhuo

2016-01-01

At present, to realize or improve the quality of experience (QoE) is a major goal for network media transmission service, and QoE evaluation is the basis for adjusting the transmission control mechanism. Therefore, a kind of QoE collaborative evaluation method based on fuzzy clustering heuristic algorithm is proposed in this paper, which is concentrated on service score calculation at the server side. The server side collects network transmission quality of service (QoS) parameter, node location data, and user expectation value from client feedback information. Then it manages the historical data in database through the "big data" process mode, and predicts user score according to heuristic rules. On this basis, it completes fuzzy clustering analysis, and generates service QoE score and management message, which will be finally fed back to clients. Besides, this paper mainly discussed service evaluation generative rules, heuristic evaluation rules and fuzzy clustering analysis methods, and presents service-based QoE evaluation processes. The simulation experiments have verified the effectiveness of QoE collaborative evaluation method based on fuzzy clustering heuristic rules.
Comparing Different Fault Identification Algorithms in Distributed Power System

NASA Astrophysics Data System (ADS)

Alkaabi, Salim

A power system is a huge complex system that delivers the electrical power from the generation units to the consumers. As the demand for electrical power increases, distributed power generation was introduced to the power system. Faults may occur in the power system at any time in different locations. These faults cause a huge damage to the system as they might lead to full failure of the power system. Using distributed generation in the power system made it even harder to identify the location of the faults in the system. The main objective of this work is to test the different fault location identification algorithms while tested on a power system with the different amount of power injected using distributed generators. As faults may lead the system to full failure, this is an important area for research. In this thesis different fault location identification algorithms have been tested and compared while the different amount of power is injected from distributed generators. The algorithms were tested on IEEE 34 node test feeder using MATLAB and the results were compared to find when these algorithms might fail and the reliability of these methods.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs.

PubMed

Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin

2018-02-10

Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs

PubMed Central

Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin

2018-01-01

Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead. PMID:29439443
Machine-learned cluster identification in high-dimensional data.

PubMed

Ultsch, Alfred; Lötsch, Jörn

2017-02-01

High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a

Graphical Evaluation of Hierarchical Clustering Schemes. Technical Report No. 1.

ERIC Educational Resources Information Center

Halff, Henry M.

Graphical methods for evaluating the fit of Johnson's hierarchical clustering schemes are presented together with an example. These evaluation methods examine the extent to which the clustering algorithm can minimize the overlap of the distributions of intracluster and intercluster distances. (Author)
Distributed autonomous systems: resource management, planning, and control algorithms

NASA Astrophysics Data System (ADS)

Smith, James F., III; Nguyen, ThanhVu H.

2005-05-01

Distributed autonomous systems, i.e., systems that have separated distributed components, each of which, exhibit some degree of autonomy are increasingly providing solutions to naval and other DoD problems. Recently developed control, planning and resource allocation algorithms for two types of distributed autonomous systems will be discussed. The first distributed autonomous system (DAS) to be discussed consists of a collection of unmanned aerial vehicles (UAVs) that are under fuzzy logic control. The UAVs fly and conduct meteorological sampling in a coordinated fashion determined by their fuzzy logic controllers to determine the atmospheric index of refraction. Once in flight no human intervention is required. A fuzzy planning algorithm determines the optimal trajectory, sampling rate and pattern for the UAVs and an interferometer platform while taking into account risk, reliability, priority for sampling in certain regions, fuel limitations, mission cost, and related uncertainties. The real-time fuzzy control algorithm running on each UAV will give the UAV limited autonomy allowing it to change course immediately without consulting with any commander, request other UAVs to help it, alter its sampling pattern and rate when observing interesting phenomena, or to terminate the mission and return to base. The algorithms developed will be compared to a resource manager (RM) developed for another DAS problem related to electronic attack (EA). This RM is based on fuzzy logic and optimized by evolutionary algorithms. It allows a group of dissimilar platforms to use EA resources distributed throughout the group. For both DAS types significant theoretical and simulation results will be presented.
Galaxy Distribution in Clusters of Galaxies

NASA Astrophysics Data System (ADS)

Okamoto, T.; Yachi, S.; Habe, A.

beta-discrepancy have been pointed out from comparison of optical and X-ray observations of clusters of galaxies. To examine physical reason of beta-discrepancy, we use N-body simulation which contains two components, dark particles and galaxies which are identified by using adaptive-linking friend of friend technique at a certain red-shift. The gas component is not included here, since the gas distribution follows the dark matter distribution in dark halos (Jubio F. Navarro, Carlos S. Frenk and Simon D. M. White 1995). We find that the galaxy distribution follows the dark matter distribution, therefore beta-discrepancy does not exist, and this result is consistent with the interpretation of the beta-discrepancy by Bahcall and Lubin (1994), which was based on recent observation.
Cluster analysis based on dimensional information with applications to feature selection and classification

NASA Technical Reports Server (NTRS)

Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

1974-01-01

A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Utility of K-Means clustering algorithm in differentiating apparent diffusion coefficient values between benign and malignant neck pathologies

PubMed Central

Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

2014-01-01

Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723
On distribution reduction and algorithm implementation in inconsistent ordered information systems.

PubMed

Zhang, Yanqin

2014-01-01

As one part of our work in ordered information systems, distribution reduction is studied in inconsistent ordered information systems (OISs). Some important properties on distribution reduction are studied and discussed. The dominance matrix is restated for reduction acquisition in dominance relations based information systems. Matrix algorithm for distribution reduction acquisition is stepped. And program is implemented by the algorithm. The approach provides an effective tool for the theoretical research and the applications for ordered information systems in practices. For more detailed and valid illustrations, cases are employed to explain and verify the algorithm and the program which shows the effectiveness of the algorithm in complicated information systems.
Spatial and kinematic distributions of transition populations in intermediate redshift galaxy clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crawford, Steven M.; Wirth, Gregory D.; Bershady, Matthew A., E-mail: crawford@saao.ac.za, E-mail: wirth@keck.hawaii.edu, E-mail: mab@astro.wisc.edu

2014-05-01

We analyze the spatial and velocity distributions of confirmed members in five massive clusters of galaxies at intermediate redshift (0.5 < z < 0.9) to investigate the physical processes driving galaxy evolution. Based on spectral classifications derived from broad- and narrow-band photometry, we define four distinct galaxy populations representing different evolutionary stages: red sequence (RS) galaxies, blue cloud (BC) galaxies, green valley (GV) galaxies, and luminous compact blue galaxies (LCBGs). For each galaxy class, we derive the projected spatial and velocity distribution and characterize the degree of subclustering. We find that RS, BC, and GV galaxies in these clusters havemore » similar velocity distributions, but that BC and GV galaxies tend to avoid the core of the two z ≈ 0.55 clusters. GV galaxies exhibit subclustering properties similar to RS galaxies, but their radial velocity distribution is significantly platykurtic compared to the RS galaxies. The absence of GV galaxies in the cluster cores may explain their somewhat prolonged star-formation history. The LCBGs appear to have recently fallen into the cluster based on their larger velocity dispersion, absence from the cores of the clusters, and different radial velocity distribution than the RS galaxies. Both LCBG and BC galaxies show a high degree of subclustering on the smallest scales, leading us to conclude that star formation is likely triggered by galaxy-galaxy interactions during infall into the cluster.« less
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds

NASA Astrophysics Data System (ADS)

Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.

In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
Algorithms and software used in selecting structure of machine-training cluster based on neurocomputers

NASA Astrophysics Data System (ADS)

Romanchuk, V. A.; Lukashenko, V. V.

2018-05-01

The technique of functioning of a control system by a computing cluster based on neurocomputers is proposed. Particular attention is paid to the method of choosing the structure of the computing cluster due to the fact that the existing methods are not effective because of a specialized hardware base - neurocomputers, which are highly parallel computer devices with an architecture different from the von Neumann architecture. A developed algorithm for choosing the computational structure of a cloud cluster is described, starting from the direction of data transfer in the flow control graph of the program and its adjacency matrix.
GEANT4 distributed computing for compact clusters

NASA Astrophysics Data System (ADS)

Harrawood, Brian P.; Agasthya, Greeshma A.; Lakshmanan, Manu N.; Raterman, Gretchen; Kapadia, Anuj J.

2014-11-01

A new technique for distribution of GEANT4 processes is introduced to simplify running a simulation in a parallel environment such as a tightly coupled computer cluster. Using a new C++ class derived from the GEANT4 toolkit, multiple runs forming a single simulation are managed across a local network of computers with a simple inter-node communication protocol. The class is integrated with the GEANT4 toolkit and is designed to scale from a single symmetric multiprocessing (SMP) machine to compact clusters ranging in size from tens to thousands of nodes. User designed 'work tickets' are distributed to clients using a client-server work flow model to specify the parameters for each individual run of the simulation. The new g4DistributedRunManager class was developed and well tested in the course of our Neutron Stimulated Emission Computed Tomography (NSECT) experiments. It will be useful for anyone running GEANT4 for large discrete data sets such as covering a range of angles in computed tomography, calculating dose delivery with multiple fractions or simply speeding the through-put of a single model.
The Gas Distribution in Galaxy Cluster Outer Regions

NASA Technical Reports Server (NTRS)

Eckert, D.; Vazza, F.; Ettori, S.; Molendi, S.; Nagai, D.; Laue, E. T.; Roncarelli, M.; Rossetti, M.; Snowden, S. L.; Gastaldello, F.

2012-01-01

Aims. We present the analysis of a local (z = 0.04 - 0.2) sample of 31 galaxy clusters with the aim of measuring the density of the X-ray emitting gas in cluster outskirts. We compare our results with numerical simulations to set constraints on the azimuthal symmetry and gas clumping in the outer regions of galaxy clusters. Methods. We exploit the large field-of-view and low instrumental background of ROSAT/PSPC to trace the density of the intracluster gas out to the virial radius. We perform a stacking of the density profiles to detect a signal beyond r200 and measure the typical density and scatter in cluster outskirts. We also compute the azimuthal scatter of the profiles with respect to the mean value to look for deviations from spherical symmetry. Finally, we compare our average density and scatter profiles with the results of numerical simulations. Results. As opposed to some recent Suzaku results, and confirming previous evidence from ROSAT and Chandra, we observe a steepening of the density profiles beyond approximately r(sub 500). Comparing our density profiles with simulations, we find that non-radiative runs predict too steep density profiles, whereas runs including additional physics and/or treating gas clumping are in better agreement with the observed gas distribution. We report for the first time the high-confidence detection of a systematic difference between cool-core and non-cool core clusters beyond 0.3r(sub 200), which we explain by a different distribution of the gas in the two classes. Beyond r(sub 500), galaxy clusters deviate significantly from spherical symmetry, with only little differences between relaxed and disturbed systems. We find good agreement between the observed and predicted scatter profiles, but only when the 1% densest clumps are filtered out in the simulations. Conclusions. Comparing our results with numerical simulations, we find that non-radiative simulations fail to reproduce the gas distribution, even well outside cluster
Mesh Algorithms for PDE with Sieve I: Mesh Distribution

DOE PAGES

Knepley, Matthew G.; Karpeev, Dmitry A.

2009-01-01

We have developed a new programming framework, called Sieve, to support parallel numerical partial differential equation(s) (PDE) algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or arrows, the conceptual first-class objects represented in the containers. Further, generic algorithms acting on this arrow container are systematically used to provide natural geometric operations on the topology and also, through duality, on the data. Finally, coverings and duality are used to encode notmore » only individual meshes, but all types of hierarchies underlying PDE data structures, including multigrid and mesh partitions. In order to demonstrate the usefulness of the framework, we show how the mesh partition data can be represented and manipulated using the same fundamental mechanisms used to represent meshes. We present the complete description of an algorithm to encode a mesh partition and then distribute a mesh, which is independent of the mesh dimension, element shape, or embedding. Moreover, data associated with the mesh can be similarly distributed with exactly the same algorithm. The use of a high level of abstraction within the Sieve leads to several benefits in terms of code reuse, simplicity, and extensibility. We discuss these benefits and compare our approach to other existing mesh libraries.« less
A Computational Algorithm for Functional Clustering of Proteome Dynamics During Development

PubMed Central

Wang, Yaqun; Wang, Ningtao; Hao, Han; Guo, Yunqian; Zhen, Yan; Shi, Jisen; Wu, Rongling

2014-01-01

Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role. PMID:24955031
Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

ERIC Educational Resources Information Center

Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

2013-01-01

This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
A distributed scheduling algorithm for heterogeneous real-time systems

NASA Technical Reports Server (NTRS)

Zeineldine, Osman; El-Toweissy, Mohamed; Mukkamala, Ravi

1991-01-01

Much of the previous work on load balancing and scheduling in distributed environments was concerned with homogeneous systems and homogeneous loads. Several of the results indicated that random policies are as effective as other more complex load allocation policies. The effects of heterogeneity on scheduling algorithms for hard real time systems is examined. A distributed scheduler specifically to handle heterogeneities in both nodes and node traffic is proposed. The performance of the algorithm is measured in terms of the percentage of jobs discarded. While a random task allocation is very sensitive to heterogeneities, the algorithm is shown to be robust to such non-uniformities in system components and load.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters

NASA Astrophysics Data System (ADS)

Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.

2018-06-01

We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Joint inversion of multiple geophysical and petrophysical data using generalized fuzzy clustering algorithms

NASA Astrophysics Data System (ADS)

Sun, Jiajia; Li, Yaoguo

2017-02-01

Joint inversion that simultaneously inverts multiple geophysical data sets to recover a common Earth model is increasingly being applied to exploration problems. Petrophysical data can serve as an effective constraint to link different physical property models in such inversions. There are two challenges, among others, associated with the petrophysical approach to joint inversion. One is related to the multimodality of petrophysical data because there often exist more than one relationship between different physical properties in a region of study. The other challenge arises from the fact that petrophysical relationships have different characteristics and can exhibit point, linear, quadratic, or exponential forms in a crossplot. The fuzzy c-means (FCM) clustering technique is effective in tackling the first challenge and has been applied successfully. We focus on the second challenge in this paper and develop a joint inversion method based on variations of the FCM clustering technique. To account for the specific shapes of petrophysical relationships, we introduce several different fuzzy clustering algorithms that are capable of handling different shapes of petrophysical relationships. We present two synthetic and one field data examples and demonstrate that, by choosing appropriate distance measures for the clustering component in the joint inversion algorithm, the proposed joint inversion method provides an effective means of handling common petrophysical situations we encounter in practice. The jointly inverted models have both enhanced structural similarity and increased petrophysical correlation, and better represent the subsurface in the spatial domain and the parameter domain of physical properties.
Robust Bayesian clustering.

PubMed

Archambeau, Cédric; Verleysen, Michel

2007-01-01

A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning machines which are based on a divide-and-conquer approach. They are commonly used for density estimation and clustering tasks, but are sensitive to outliers. The Student-t distribution has heavier tails than the Gaussian distribution and is therefore less sensitive to any departure of the empirical distribution from Gaussianity. As a consequence, the Student-t distribution is suitable for constructing robust mixture models. In this work, we formalize the Bayesian Student-t mixture model as a latent variable model in a different way from Svensén and Bishop [Svensén, M., & Bishop, C. M. (2005). Robust Bayesian mixture modelling. Neurocomputing, 64, 235-252]. The main difference resides in the fact that it is not necessary to assume a factorized approximation of the posterior distribution on the latent indicator variables and the latent scale variables in order to obtain a tractable solution. Not neglecting the correlations between these unobserved random variables leads to a Bayesian model having an increased robustness. Furthermore, it is expected that the lower bound on the log-evidence is tighter. Based on this bound, the model complexity, i.e. the number of components in the mixture, can be inferred with a higher confidence.
Weighted community detection and data clustering using message passing

NASA Astrophysics Data System (ADS)

Shi, Cheng; Liu, Yanchen; Zhang, Pan

2018-03-01

Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.
Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

PubMed

Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

2016-01-01

Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.

Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms

PubMed Central

Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai

2016-01-01

Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

NASA Astrophysics Data System (ADS)

Choi, Woo-Yong; Chatterjee, Mainak

2015-03-01

With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
A new stochastic algorithm for inversion of dust aerosol size distribution

NASA Astrophysics Data System (ADS)

Wang, Li; Li, Feng; Yang, Ma-ying

2015-08-01

Dust aerosol size distribution is an important source of information about atmospheric aerosols, and it can be determined from multiwavelength extinction measurements. This paper describes a stochastic inverse technique based on artificial bee colony (ABC) algorithm to invert the dust aerosol size distribution by light extinction method. The direct problems for the size distribution of water drop and dust particle, which are the main elements of atmospheric aerosols, are solved by the Mie theory and the Lambert-Beer Law in multispectral region. And then, the parameters of three widely used functions, i.e. the log normal distribution (L-N), the Junge distribution (J-J), and the normal distribution (N-N), which can provide the most useful representation of aerosol size distributions, are inversed by the ABC algorithm in the dependent model. Numerical results show that the ABC algorithm can be successfully applied to recover the aerosol size distribution with high feasibility and reliability even in the presence of random noise.
Density-based clustering analyses to identify heterogeneous cellular sub-populations

NASA Astrophysics Data System (ADS)

Heaster, Tiffany M.; Walsh, Alex J.; Landman, Bennett A.; Skala, Melissa C.

2017-02-01

Autofluorescence microscopy of NAD(P)H and FAD provides functional metabolic measurements at the single-cell level. Here, density-based clustering algorithms were applied to metabolic autofluorescence measurements to identify cell-level heterogeneity in tumor cell cultures. The performance of the density-based clustering algorithm, DENCLUE, was tested in samples with known heterogeneity (co-cultures of breast carcinoma lines). DENCLUE was found to better represent the distribution of cell clusters compared to Gaussian mixture modeling. Overall, DENCLUE is a promising approach to quantify cell-level heterogeneity, and could be used to understand single cell population dynamics in cancer progression and treatment.
Imaging active faulting in a region of distributed deformation from the joint clustering of focal mechanisms and hypocentres: Application to the Azores-western Mediterranean region

NASA Astrophysics Data System (ADS)

Custódio, Susana; Lima, Vânia; Vales, Dina; Cesca, Simone; Carrilho, Fernando

2016-04-01

The matching between linear trends of hypocentres and fault planes indicated by focal mechanisms (FMs) is frequently used to infer the location and geometry of active faults. This practice works well in regions of fast lithospheric deformation, where earthquake patterns are clear and major structures accommodate the bulk of deformation, but typically fails in regions of slow and distributed deformation. We present a new joint FM and hypocentre cluster algorithm that is able to detect systematically the consistency between hypocentre lineations and FMs, even in regions of distributed deformation. We apply the method to the Azores-western Mediterranean region, with particular emphasis on western Iberia. The analysis relies on a compilation of hypocentres and FMs taken from regional and global earthquake catalogues, academic theses and technical reports, complemented by new FMs for western Iberia. The joint clustering algorithm images both well-known and new seismo-tectonic features. The Azores triple junction is characterised by FMs with vertical pressure (P) axes, in good agreement with the divergent setting, and the Iberian domain is characterised by NW-SE oriented P axes, indicating a response of the lithosphere to the ongoing oblique convergence between Nubia and Eurasia. Several earthquakes remain unclustered in the western Mediterranean domain, which may indicate a response to local stresses. The major regions of consistent faulting that we identify are the mid-Atlantic ridge, the Terceira rift, the Trans-Alboran shear zone and the north coast of Algeria. In addition, other smaller earthquake clusters present a good match between epicentre lineations and FM fault planes. These clusters may signal single active faults or wide zones of distributed but consistent faulting. Mainland Portugal is dominated by strike-slip earthquakes with fault planes coincident with the predominant NNE-SSW and WNW-ESE oriented earthquake lineations. Clusters offshore SW Iberia are
A voting-based star identification algorithm utilizing local and global distribution

NASA Astrophysics Data System (ADS)

Fan, Qiaoyun; Zhong, Xuyang; Sun, Junhua

2018-03-01

A novel star identification algorithm based on voting scheme is presented in this paper. In the proposed algorithm, the global distribution and local distribution of sensor stars are fully utilized, and the stratified voting scheme is adopted to obtain the candidates for sensor stars. The database optimization is employed to reduce its memory requirement and improve the robustness of the proposed algorithm. The simulation shows that the proposed algorithm exhibits 99.81% identification rate with 2-pixel standard deviations of positional noises and 0.322-Mv magnitude noises. Compared with two similar algorithms, the proposed algorithm is more robust towards noise, and the average identification time and required memory is less. Furthermore, the real sky test shows that the proposed algorithm performs well on the real star images.
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

PubMed

Wang, Xueyi

2012-02-08

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
A Differential Evolution-Based Routing Algorithm for Environmental Monitoring Wireless Sensor Networks

PubMed Central

Li, Xiaofang; Xu, Lizhong; Wang, Huibin; Song, Jie; Yang, Simon X.

2010-01-01

The traditional Low Energy Adaptive Cluster Hierarchy (LEACH) routing protocol is a clustering-based protocol. The uneven selection of cluster heads results in premature death of cluster heads and premature blind nodes inside the clusters, thus reducing the overall lifetime of the network. With a full consideration of information on energy and distance distribution of neighboring nodes inside the clusters, this paper proposes a new routing algorithm based on differential evolution (DE) to improve the LEACH routing protocol. To meet the requirements of monitoring applications in outdoor environments such as the meteorological, hydrological and wetland ecological environments, the proposed algorithm uses the simple and fast search features of DE to optimize the multi-objective selection of cluster heads and prevent blind nodes for improved energy efficiency and system stability. Simulation results show that the proposed new LEACH routing algorithm has better performance, effectively extends the working lifetime of the system, and improves the quality of the wireless sensor networks. PMID:22219670
Spatial event cluster detection using an approximate normal distribution.

PubMed

Torabi, Mahmoud; Rosychuk, Rhonda J

2008-12-12

In geographic surveillance of disease, areas with large numbers of disease cases are to be identified so that investigations of the causes of high disease rates can be pursued. Areas with high rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. Typically cluster detection tests are applied to incident or prevalent cases of disease, but surveillance of disease-related events, where an individual may have multiple events, may also be of interest. Previously, a compound Poisson approach that detects clusters of events by testing individual areas that may be combined with their neighbours has been proposed. However, the relevant probabilities from the compound Poisson distribution are obtained from a recursion relation that can be cumbersome if the number of events are large or analyses by strata are performed. We propose a simpler approach that uses an approximate normal distribution. This method is very easy to implement and is applicable to situations where the population sizes are large and the population distribution by important strata may differ by area. We demonstrate the approach on pediatric self-inflicted injury presentations to emergency departments and compare the results for probabilities based on the recursion and the normal approach. We also implement a Monte Carlo simulation to study the performance of the proposed approach. In a self-inflicted injury data example, the normal approach identifies twelve out of thirteen of the same clusters as the compound Poisson approach, noting that the compound Poisson method detects twelve significant clusters in total. Through simulation studies, the normal approach well approximates the compound Poisson approach for a variety of different population sizes and case and event thresholds. A drawback of the compound Poisson approach is that the relevant probabilities must be determined through a
An algorithm of discovering signatures from DNA databases on a computer cluster.

PubMed

Lee, Hsiao Ping; Sheu, Tzu-Fang

2014-10-05

Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering

PubMed Central

Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

2015-01-01

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.

PubMed

Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth

2015-05-01

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
Applications of colored petri net and genetic algorithms to cluster tool scheduling

NASA Astrophysics Data System (ADS)

Liu, Tung-Kuan; Kuo, Chih-Jen; Hsiao, Yung-Chin; Tsai, Jinn-Tsong; Chou, Jyh-Horng

2005-12-01

In this paper, we propose a method, which uses Coloured Petri Net (CPN) and genetic algorithm (GA) to obtain an optimal deadlock-free schedule and to solve re-entrant problem for the flexible process of the cluster tool. The process of the cluster tool for producing a wafer usually can be classified into three types: 1) sequential process, 2) parallel process, and 3) sequential parallel process. But these processes are not economical enough to produce a variety of wafers in small volume. Therefore, this paper will propose the flexible process where the operations of fabricating wafers are randomly arranged to achieve the best utilization of the cluster tool. However, the flexible process may have deadlock and re-entrant problems which can be detected by CPN. On the other hand, GAs have been applied to find the optimal schedule for many types of manufacturing processes. Therefore, we successfully integrate CPN and GAs to obtain an optimal schedule with the deadlock and re-entrant problems for the flexible process of the cluster tool.
A Modified Distributed Bees Algorithm for Multi-Sensor Task Allocation.

PubMed

Tkach, Itshak; Jevtić, Aleksandar; Nof, Shimon Y; Edan, Yael

2018-03-02

Multi-sensor systems can play an important role in monitoring tasks and detecting targets. However, real-time allocation of heterogeneous sensors to dynamic targets/tasks that are unknown a priori in their locations and priorities is a challenge. This paper presents a Modified Distributed Bees Algorithm (MDBA) that is developed to allocate stationary heterogeneous sensors to upcoming unknown tasks using a decentralized, swarm intelligence approach to minimize the task detection times. Sensors are allocated to tasks based on sensors' performance, tasks' priorities, and the distances of the sensors from the locations where the tasks are being executed. The algorithm was compared to a Distributed Bees Algorithm (DBA), a Bees System, and two common multi-sensor algorithms, market-based and greedy-based algorithms, which were fitted for the specific task. Simulation analyses revealed that MDBA achieved statistically significant improved performance by 7% with respect to DBA as the second-best algorithm, and by 19% with respect to Greedy algorithm, which was the worst, thus indicating its fitness to provide solutions for heterogeneous multi-sensor systems.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

PubMed

Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

2017-08-31

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
Anomaly clustering in hyperspectral images

NASA Astrophysics Data System (ADS)

Doster, Timothy J.; Ross, David S.; Messinger, David W.; Basener, William F.

2009-05-01

The topological anomaly detection algorithm (TAD) differs from other anomaly detection algorithms in that it uses a topological/graph-theoretic model for the image background instead of modeling the image with a Gaussian normal distribution. In the construction of the model, TAD produces a hard threshold separating anomalous pixels from background in the image. We build on this feature of TAD by extending the algorithm so that it gives a measure of the number of anomalous objects, rather than the number of anomalous pixels, in a hyperspectral image. This is done by identifying, and integrating, clusters of anomalous pixels via a graph theoretical method combining spatial and spectral information. The method is applied to a cluttered HyMap image and combines small groups of pixels containing like materials, such as those corresponding to rooftops and cars, into individual clusters. This improves visualization and interpretation of objects.
Multivariate Spatial Condition Mapping Using Subtractive Fuzzy Cluster Means

PubMed Central

Sabit, Hakilo; Al-Anbuky, Adnan

2014-01-01

Wireless sensor networks are usually deployed for monitoring given physical phenomena taking place in a specific space and over a specific duration of time. The spatio-temporal distribution of these phenomena often correlates to certain physical events. To appropriately characterise these events-phenomena relationships over a given space for a given time frame, we require continuous monitoring of the conditions. WSNs are perfectly suited for these tasks, due to their inherent robustness. This paper presents a subtractive fuzzy cluster means algorithm and its application in data stream mining for wireless sensor systems over a cloud-computing-like architecture, which we call sensor cloud data stream mining. Benchmarking on standard mining algorithms, the k-means and the FCM algorithms, we have demonstrated that the subtractive fuzzy cluster means model can perform high quality distributed data stream mining tasks comparable to centralised data stream mining. PMID:25313495
Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

PubMed Central

Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

2017-01-01

We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data

PubMed Central

2013-01-01

Background The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard. Results Here we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs. Conclusions To our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at http://biclue.mpi-inf.mpg.de. PMID:24565035
The global Minmax k-means algorithm.

PubMed

Wang, Xiaoyan; Bai, Yanping

2016-01-01

The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.

Operating room scheduling using hybrid clustering priority rule and genetic algorithm

NASA Astrophysics Data System (ADS)

Santoso, Linda Wahyuni; Sinawan, Aisyah Ashrinawati; Wijaya, Andi Rahadiyan; Sudiarso, Andi; Masruroh, Nur Aini; Herliansyah, Muhammad Kusumawan

2017-11-01

Operating room is a bottleneck resource in most hospitals so that operating room scheduling system will influence the whole performance of the hospitals. This research develops a mathematical model of operating room scheduling for elective patients which considers patient priority with limit number of surgeons, operating rooms, and nurse team. Clustering analysis was conducted to the data of surgery durations using hierarchical and non-hierarchical methods. The priority rule of each resulting cluster was determined using Shortest Processing Time method. Genetic Algorithm was used to generate daily operating room schedule which resulted in the lowest values of patient waiting time and nurse overtime. The computational results show that this proposed model reduced patient waiting time by approximately 32.22% and nurse overtime by approximately 32.74% when compared to actual schedule.
Effect of Clustering Algorithm on Establishing Markov State Model for Molecular Dynamics Simulations.

PubMed

Li, Yan; Dong, Zigang

2016-06-27

Recently, the Markov state model has been applied for kinetic analysis of molecular dynamics simulations. However, discretization of the conformational space remains a primary challenge in model building, and it is not clear how the space decomposition by distinct clustering strategies exerts influence on the model output. In this work, different clustering algorithms are employed to partition the conformational space sampled in opening and closing of fatty acid binding protein 4 as well as inactivation and activation of the epidermal growth factor receptor. Various classifications are achieved, and Markov models are set up accordingly. On the basis of the models, the total net flux and transition rate are calculated between two distinct states. Our results indicate that geometric and kinetic clustering perform equally well. The construction and outcome of Markov models are heavily dependent on the data traits. Compared to other methods, a combination of Bayesian and hierarchical clustering is feasible in identification of metastable states.
Statistical sampling of the distribution of uranium deposits using geologic/geographic clusters

USGS Publications Warehouse

Finch, W.I.; Grundy, W.D.; Pierson, C.T.

1992-01-01

The concept of geologic/geographic clusters was developed particularly to study grade and tonnage models for sandstone-type uranium deposits. A cluster is a grouping of mined as well as unmined uranium occurrences within an arbitrary area about 8 km across. A cluster is a statistical sample that will reflect accurately the distribution of uranium in large regions relative to various geologic and geographic features. The example of the Colorado Plateau Uranium Province reveals that only 3 percent of the total number of clusters is in the largest tonnage-size category, greater than 10,000 short tons U3O8, and that 80 percent of the clusters are hosted by Triassic and Jurassic rocks. The distributions of grade and tonnage for clusters in the Powder River Basin show a wide variation; the grade distribution is highly variable, reflecting a difference between roll-front deposits and concretionary deposits, and the Basin contains about half the number in the greater-than-10,000 tonnage-size class as does the Colorado Plateau, even though it is much smaller. The grade and tonnage models should prove useful in finding the richest and largest uranium deposits. ?? 1992 Oxford University Press.
Thermal and log-normal distributions of plasma in laser driven Coulomb explosions of deuterium clusters

NASA Astrophysics Data System (ADS)

Barbarino, M.; Warrens, M.; Bonasera, A.; Lattuada, D.; Bang, W.; Quevedo, H. J.; Consoli, F.; de Angelis, R.; Andreoli, P.; Kimura, S.; Dyer, G.; Bernstein, A. C.; Hagel, K.; Barbui, M.; Schmidt, K.; Gaul, E.; Donovan, M. E.; Natowitz, J. B.; Ditmire, T.

2016-08-01

In this work, we explore the possibility that the motion of the deuterium ions emitted from Coulomb cluster explosions is highly disordered enough to resemble thermalization. We analyze the process of nuclear fusion reactions driven by laser-cluster interactions in experiments conducted at the Texas Petawatt laser facility using a mixture of D2+3He and CD4+3He cluster targets. When clusters explode by Coulomb repulsion, the emission of the energetic ions is “nearly” isotropic. In the framework of cluster Coulomb explosions, we analyze the energy distributions of the ions using a Maxwell-Boltzmann (MB) distribution, a shifted MB distribution (sMB), and the energy distribution derived from a log-normal (LN) size distribution of clusters. We show that the first two distributions reproduce well the experimentally measured ion energy distributions and the number of fusions from d-d and d-3He reactions. The LN distribution is a good representation of the ion kinetic energy distribution well up to high momenta where the noise becomes dominant, but overestimates both the neutron and the proton yields. If the parameters of the LN distributions are chosen to reproduce the fusion yields correctly, the experimentally measured high energy ion spectrum is not well represented. We conclude that the ion kinetic energy distribution is highly disordered and practically not distinguishable from a thermalized one.
OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing

NASA Astrophysics Data System (ADS)

Wei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping

2017-02-01

The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
Utility of the k-means clustering algorithm in differentiating apparent diffusion coefficient values of benign and malignant neck pathologies.

PubMed

Srinivasan, A; Galbán, C J; Johnson, T D; Chenevert, T L; Ross, B D; Mukherji, S K

2010-04-01

Does the K-means algorithm do a better job of differentiating benign and malignant neck pathologies compared to only mean ADC? The objective of our study was to analyze the differences between ADC partitions to evaluate whether the K-means technique can be of additional benefit to whole-lesion mean ADC alone in distinguishing benign and malignant neck pathologies. MR imaging studies of 10 benign and 10 malignant proved neck pathologies were postprocessed on a PC by using in-house software developed in Matlab. Two neuroradiologists manually contoured the lesions, with the ADC values within each lesion clustered into 2 (low, ADC-ADC(L); high, ADC-ADC(H)) and 3 partitions (ADC(L); intermediate, ADC-ADC(I); ADC(H)) by using the K-means clustering algorithm. An unpaired 2-tailed Student t test was performed for all metrics to determine statistical differences in the means of the benign and malignant pathologies. A statistically significant difference between the mean ADC(L) clusters in benign and malignant pathologies was seen in the 3-cluster models of both readers (P = .03 and .022, respectively) and the 2-cluster model of reader 2 (P = .04), with the other metrics (ADC(H), ADC(I); whole-lesion mean ADC) not revealing any significant differences. ROC curves demonstrated the quantitative differences in mean ADC(H) and ADC(L) in both the 2- and 3-cluster models to be predictive of malignancy (2 clusters: P = .008, area under curve = 0.850; 3 clusters: P = .01, area under curve = 0.825). The K-means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared with whole-lesion mean ADC alone.
Evaluating clustering methods within the Artificial Ecosystem Algorithm and their application to bike redistribution in London.

PubMed

Adham, Manal T; Bentley, Peter J

2016-08-01

This paper proposes and evaluates a solution to the truck redistribution problem prominent in London's Santander Cycle scheme. Due to the complexity of this NP-hard combinatorial optimisation problem, no efficient optimisation techniques are known to solve the problem exactly. This motivates our use of the heuristic Artificial Ecosystem Algorithm (AEA) to find good solutions in a reasonable amount of time. The AEA is designed to take advantage of highly distributed computer architectures and adapt to changing problems. In the AEA a problem is first decomposed into its relative sub-components; they then evolve solution building blocks that fit together to form a single optimal solution. Three variants of the AEA centred on evaluating clustering methods are presented: the baseline AEA, the community-based AEA which groups stations according to journey flows, and the Adaptive AEA which actively modifies clusters to cater for changes in demand. We applied these AEA variants to the redistribution problem prominent in bike share schemes (BSS). The AEA variants are empirically evaluated using historical data from Santander Cycles to validate the proposed approach and prove its potential effectiveness. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

PubMed Central

Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

2017-01-01

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.

PubMed

Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B

2011-02-01

This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.
Pruning Neural Networks with Distribution Estimation Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cantu-Paz, E

2003-01-15

This paper describes the application of four evolutionary algorithms to the pruning of neural networks used in classification problems. Besides of a simple genetic algorithm (GA), the paper considers three distribution estimation algorithms (DEAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the DEAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a feed forward neural network trained with standard back propagation and public-domain and artificial data sets. The pruned networks seemed to have better or equal accuracy than themore » original fully-connected networks. Only in a few cases, pruning resulted in less accurate networks. We found few differences in the accuracy of the networks pruned by the four EAs, but found important differences in the execution time. The results suggest that a simple GA with a small population might be the best algorithm for pruning networks on the data sets we tested.« less
Density-Aware Clustering Based on Aggregated Heat Kernel and Its Transformation

DOE PAGES

Huang, Hao; Yoo, Shinjae; Yu, Dantong; ...

2015-06-01

Current spectral clustering algorithms suffer from the sensitivity to existing noise, and parameter scaling, and may not be aware of different density distributions across clusters. If these problems are left untreated, the consequent clustering results cannot accurately represent true data patterns, in particular, for complex real world datasets with heterogeneous densities. This paper aims to solve these problems by proposing a diffusion-based Aggregated Heat Kernel (AHK) to improve the clustering stability, and a Local Density Affinity Transformation (LDAT) to correct the bias originating from different cluster densities. AHK statistically\\ models the heat diffusion traces along the entire time scale, somore » it ensures robustness during clustering process, while LDAT probabilistically reveals local density of each instance and suppresses the local density bias in the affinity matrix. Our proposed framework integrates these two techniques systematically. As a result, not only does it provide an advanced noise-resisting and density-aware spectral mapping to the original dataset, but also demonstrates the stability during the processing of tuning the scaling parameter (which usually controls the range of neighborhood). Furthermore, our framework works well with the majority of similarity kernels, which ensures its applicability to many types of data and problem domains. The systematic experiments on different applications show that our proposed algorithms outperform state-of-the-art clustering algorithms for the data with heterogeneous density distributions, and achieve robust clustering performance with respect to tuning the scaling parameter and handling various levels and types of noise.« less
Multi-Optimisation Consensus Clustering

NASA Astrophysics Data System (ADS)

Li, Jian; Swift, Stephen; Liu, Xiaohui

Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
THE SWIFT AGN AND CLUSTER SURVEY. II. CLUSTER CONFIRMATION WITH SDSS DATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griffin, Rhiannon D.; Dai, Xinyu; Kochanek, Christopher S.

2016-01-15

We study 203 (of 442) Swift AGN and Cluster Survey extended X-ray sources located in the SDSS DR8 footprint to search for galaxy over-densities in three-dimensional space using SDSS galaxy photometric redshifts and positions near the Swift cluster candidates. We find 104 Swift clusters with a >3σ galaxy over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmation as galaxy clusters. We present a series of cluster properties including the redshift, brightest cluster galaxy (BCG) magnitude, BCG-to-X-ray center offset, optical richness, and X-ray luminosity. We also detect red sequences in ∼85% ofmore » the 104 confirmed clusters. The X-ray luminosity and optical richness for the SDSS confirmed Swift clusters are correlated and follow previously established relations. The distribution of the separations between the X-ray centroids and the most likely BCG is also consistent with expectation. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≲ 0.3 and is still 80% complete up to z ≃ 0.4, consistent with the SDSS survey depth. These analysis results suggest that our Swift cluster selection algorithm has yielded a statistically well-defined cluster sample for further study of cluster evolution and cosmology. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 23, and 1 matches in optical, X-ray, and Sunyaev–Zel’dovich catalogs, respectively, and so the majority of these clusters are new detections.« less
3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

NASA Astrophysics Data System (ADS)

Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran

2017-03-01

In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.
Information Clustering Based on Fuzzy Multisets.

ERIC Educational Resources Information Center

Miyamoto, Sadaaki

2003-01-01

Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Advances in Significance Testing for Cluster Detection

NASA Astrophysics Data System (ADS)

Coleman, Deidra Andrea

Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic
Packets Distributing Evolutionary Algorithm Based on PSO for Ad Hoc Network

NASA Astrophysics Data System (ADS)

Xu, Xiao-Feng

2018-03-01

Wireless communication network has such features as limited bandwidth, changeful channel and dynamic topology, etc. Ad hoc network has lots of difficulties in accessing control, bandwidth distribution, resource assign and congestion control. Therefore, a wireless packets distributing Evolutionary algorithm based on PSO (DPSO)for Ad Hoc Network is proposed. Firstly, parameters impact on performance of network are analyzed and researched to obtain network performance effective function. Secondly, the improved PSO Evolutionary Algorithm is used to solve the optimization problem from local to global in the process of network packets distributing. The simulation results show that the algorithm can ensure fairness and timeliness of network transmission, as well as improve ad hoc network resource integrated utilization efficiency.
Baryon Distribution in Galaxy Clusters as a Result of Sedimentation of Helium Nuclei.

PubMed

Qin; Wu

2000-01-20

Heavy particles in galaxy clusters tend to be more centrally concentrated than light ones according to the Boltzmann distribution. An estimate of the drift velocity suggests that it is possible that the helium nuclei may have entirely or partially sedimented into the cluster core within the Hubble time. We demonstrate this scenario using the Navarro-Frenk-White profile as the dark matter distribution of clusters and assuming that the intracluster gas is isothermal and in hydrostatic equilibrium. We find that a greater fraction of baryonic matter is distributed at small radii than at large radii, which challenges the prevailing claim that the baryon fraction increases monotonically with cluster radius. It shows that the conventional mass estimate using X-ray measurements of intracluster gas along with a constant mean molecular weight may have underestimated the total cluster mass by approximately 20%, which in turn leads to an overestimate of the total baryon fraction by the same percentage. Additionally, it is pointed out that the sedimentation of helium nuclei toward cluster cores may at least partially account for the sharp peaks in the central X-ray emissions observed in some clusters.
Graphical Representations and Cluster Algorithms for Ice Rule Vertex Models.

NASA Astrophysics Data System (ADS)

Shtengel, Kirill; Chayes, L.

2002-03-01

We introduce a new class of polymer models which is closely related to loop models, recently a topic of intensive studies. These particular models arise as graphical representations for ice-rule vertex models. The associated cluster algorithms provide a unification and generalisation of most of the existing algorithms. For many lattices, percolation in the polymer models evidently indicates first order phase transitions in the vertex models. Critical phases can be understood as being susceptible to colour symmetry breaking in the polymer models. The analysis includes, but is certainly not limited to the square lattice six-vertex model. In particular, analytic criteria can be found for low temperature phases in other even coordinated 2D lattices such as the triangular lattice, or higher dimensional lattices such as the hyper-cubic lattices of arbitrary dimensionality. Finally, our approach can be generalised to the vertex models that do not obey the ice rule, such as the eight-vertex model.

Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

NASA Astrophysics Data System (ADS)

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Data depth based clustering analysis

DOE PAGES

Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

2016-01-01

Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Optimizing Cluster Heads for Energy Efficiency in Large-Scale Heterogeneous Wireless Sensor Networks

DOE PAGES

Gu, Yi; Wu, Qishi; Rao, Nageswara S. V.

2010-01-01

Many complex sensor network applications require deploying a large number of inexpensive and small sensors in a vast geographical region to achieve quality through quantity. Hierarchical clustering is generally considered as an efficient and scalable way to facilitate the management and operation of such large-scale networks and minimize the total energy consumption for prolonged lifetime. Judicious selection of cluster heads for data integration and communication is critical to the success of applications based on hierarchical sensor networks organized as layered clusters. We investigate the problem of selecting sensor nodes in a predeployed sensor network to be the cluster heads tomore » minimize the total energy needed for data gathering. We rigorously derive an analytical formula to optimize the number of cluster heads in sensor networks under uniform node distribution, and propose a Distance-based Crowdedness Clustering algorithm to determine the cluster heads in sensor networks under general node distribution. The results from an extensive set of experiments on a large number of simulated sensor networks illustrate the performance superiority of the proposed solution over the clustering schemes based on k -means algorithm.« less
The kinematics of dense clusters of galaxies. II - The distribution of velocity dispersions

NASA Technical Reports Server (NTRS)

Zabludoff, Ann I.; Geller, Margaret J.; Huchra, John P.; Ramella, Massimo

1993-01-01

From the survey of 31 Abell R above 1 cluster fields within z of 0.02-0.05, we extract 25 dense clusters with velocity dispersions omicron above 300 km/s and with number densities exceeding the mean for the Great Wall of galaxies by one deviation. From the CfA Redshift Survey (in preparation), we obtain an approximately volume-limited catalog of 31 groups with velocity dispersions above 100 km/s and with the same number density limit. We combine these well-defined samples to obtain the distribution of cluster velocity dispersions. The group sample enables us to correct for incompleteness in the Abell catalog at low velocity dispersions. The clusters from the Abell cluster fields populate the high dispersion tail. For systems with velocity dispersions above 700 km/s, approximately the median for R = 1 clusters, the group and cluster abundances are consistent. The combined distribution is consistent with cluster X-ray temperature functions.
[Cluster analysis in biomedical researches].

PubMed

Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

2013-01-01

Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
VizieR Online Data Catalog: Proper motions of PM2000 open clusters (Krone-Martins+, 2010)

NASA Astrophysics Data System (ADS)

Krone-Martins, A.; Soubiran, C.; Ducourant, C.; Teixeira, R.; Le Campion, J. F.

2010-04-01

We present lists of proper-motions and kinematic membership probabilities in the region of 49 open clusters or possible open clusters. The stellar proper motions were taken from the Bordeaux PM2000 catalogue. The segregation between cluster and field stars and the assignment of membership probabilities was accomplished by applying a fully automated method based on parametrisations for the probability distribution functions and genetic algorithm optimisation heuristics associated with a derivative-based hill climbing algorithm for the likelihood optimization. (3 data files).
Load flow and state estimation algorithms for three-phase unbalanced power distribution systems

NASA Astrophysics Data System (ADS)

Madvesh, Chiranjeevi

Distribution load flow and state estimation are two important functions in distribution energy management systems (DEMS) and advanced distribution automation (ADA) systems. Distribution load flow analysis is a tool which helps to analyze the status of a power distribution system under steady-state operating conditions. In this research, an effective and comprehensive load flow algorithm is developed to extensively incorporate the distribution system components. Distribution system state estimation is a mathematical procedure which aims to estimate the operating states of a power distribution system by utilizing the information collected from available measurement devices in real-time. An efficient and computationally effective state estimation algorithm adapting the weighted-least-squares (WLS) method has been developed in this research. Both the developed algorithms are tested on different IEEE test-feeders and the results obtained are justified.
Constraints on the richness-mass relation and the optical-SZE positional offset distribution for SZE-selected clusters

DOE PAGES

Saro, A.

2015-10-12

In this study, we cross-match galaxy cluster candidates selected via their Sunyaev–Zel'dovich effect (SZE) signatures in 129.1 deg 2 of the South Pole Telescope 2500d SPT-SZ survey with optically identified clusters selected from the Dark Energy Survey science verification data. We identify 25 clusters between 0.1 ≲ z ≲ 0.8 in the union of the SPT-SZ and redMaPPer (RM) samples. RM is an optical cluster finding algorithm that also returns a richness estimate for each cluster. We model the richness λ-mass relation with the following function 500> ∝ B λlnM 500 + C λlnE(z) and use SPT-SZ cluster masses andmore » RM richnesses λ to constrain the parameters. We find B λ = 1.14 +0.21 –0.18 and C λ = 0.73 +0.77 –0.75. The associated scatter in mass at fixed richness is σ lnM|λ = 0.18 +0.08 –0.05 at a characteristic richness λ = 70. We demonstrate that our model provides an adequate description of the matched sample, showing that the fraction of SPT-SZ-selected clusters with RM counterparts is consistent with expectations and that the fraction of RM-selected clusters with SPT-SZ counterparts is in mild tension with expectation. We model the optical-SZE cluster positional offset distribution with the sum of two Gaussians, showing that it is consistent with a dominant, centrally peaked population and a subdominant population characterized by larger offsets. We also cross-match the RM catalogue with SPT-SZ candidates below the official catalogue threshold significance ξ = 4.5, using the RM catalogue to provide optical confirmation and redshifts for 15 additional clusters with ξ ϵ [4, 4.5].« less
Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

PubMed

Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen

2015-01-01

Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.
Gravitational lensing by clusters of galaxies - Constraining the mass distribution

NASA Technical Reports Server (NTRS)

Miralda-Escude, Jordi

1991-01-01

The possibility of placing constraints on the mass distribution of a cluster of galaxies by analyzing the cluster's gravitational lensing effect on the images of more distant galaxies is investigated theoretically in the limit of weak distortion. The steps in the proposed analysis are examined in detail, and it is concluded that detectable distortion can be produced by clusters with line-of-sight velocity dispersions of over 500 km/sec. Hence it should be possible to determine (1) the cluster center position (with accuracy equal to the mean separation of the background galaxies), (2) the cluster-potential quadrupole moment (to within about 20 percent of the total potential if velocity dispersion is 1000 km/sec), and (3) the power law for the outer-cluster density profile (if enough background galaxies in the surrounding region are observed).
Improving permafrost distribution modelling using feature selection algorithms

NASA Astrophysics Data System (ADS)

Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

2016-04-01

The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its
A Modified Distributed Bees Algorithm for Multi-Sensor Task Allocation †

PubMed Central

Nof, Shimon Y.; Edan, Yael

2018-01-01

Multi-sensor systems can play an important role in monitoring tasks and detecting targets. However, real-time allocation of heterogeneous sensors to dynamic targets/tasks that are unknown a priori in their locations and priorities is a challenge. This paper presents a Modified Distributed Bees Algorithm (MDBA) that is developed to allocate stationary heterogeneous sensors to upcoming unknown tasks using a decentralized, swarm intelligence approach to minimize the task detection times. Sensors are allocated to tasks based on sensors’ performance, tasks’ priorities, and the distances of the sensors from the locations where the tasks are being executed. The algorithm was compared to a Distributed Bees Algorithm (DBA), a Bees System, and two common multi-sensor algorithms, market-based and greedy-based algorithms, which were fitted for the specific task. Simulation analyses revealed that MDBA achieved statistically significant improved performance by 7% with respect to DBA as the second-best algorithm, and by 19% with respect to Greedy algorithm, which was the worst, thus indicating its fitness to provide solutions for heterogeneous multi-sensor systems. PMID:29498683
Polynomial-Time Approximation Algorithm for the Problem of Cardinality-Weighted Variance-Based 2-Clustering with a Given Center

NASA Astrophysics Data System (ADS)

Kel'manov, A. V.; Motkova, A. V.

2018-01-01

A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.
Distribution of sea anemones (Cnidaria, Actiniaria) in Korea analyzed by environmental clustering

USGS Publications Warehouse

Cha, H.-R.; Buddemeier, R.W.; Fautin, D.G.; Sandhei, P.

2004-01-01

Using environmental data and the geospatial clustering tools LOICZView and DISCO, we empirically tested the postulated existence and boundaries of four biogeographic regions in the southern part of the Korean peninsula. Environmental variables used included wind speed, sea surface temperature (SST), salinity, tidal amplitude, and the chlorophyll spectral signal. Our analysis confirmed the existence of four biogeographic regions, but the details of the borders between them differ from those previously postulated. Specimen-level distribution records of intertidal sea anemones were mapped; their distribution relative to the environmental data supported the importance of the environmental parameters we selected in defining suitable habitats. From the geographic coincidence between anemone distribution and the clusters based on environmental variables, we infer that geospatial clustering has the power to delimit ranges for marine organisms within relatively small geographical areas.
Quantum cluster variational method and message passing algorithms revisited

NASA Astrophysics Data System (ADS)

Domínguez, E.; Mulet, Roberto

2018-02-01

We present a general framework to study quantum disordered systems in the context of the Kikuchi's cluster variational method (CVM). The method relies in the solution of message passing-like equations for single instances or in the iterative solution of complex population dynamic algorithms for an average case scenario. We first show how a standard application of the Kikuchi's CVM can be easily translated to message passing equations for specific instances of the disordered system. We then present an "ad hoc" extension of these equations to a population dynamic algorithm representing an average case scenario. At the Bethe level, these equations are equivalent to the dynamic population equations that can be derived from a proper cavity ansatz. However, at the plaquette approximation, the interpretation is more subtle and we discuss it taking also into account previous results in classical disordered models. Moreover, we develop a formalism to properly deal with the average case scenario using a replica-symmetric ansatz within this CVM for quantum disordered systems. Finally, we present and discuss numerical solutions of the different approximations for the quantum transverse Ising model and the quantum random field Ising model in two-dimensional lattices.
Clustering Module in OLAP for Horticultural Crops using SpagoBI

NASA Astrophysics Data System (ADS)

Putri, D.; Sitanggang, I. S.

2017-03-01

Horticultural crops data are organized by the Ministry of Agriculture, Republic of Indonesia. The data are presented annually in a tabular form and result a large data set. This situation makes users difficult to obtain summaries of horticultural crops data. This study aims to develop a clustering module in the SOLAP system for the distribution of horticultural crops in Indonesia and to visualize the results of clustering in a map using SpagoBI. The algorithm used for clustering is K-Means. Horticultural crops data include vegetables, ornamental plants, medicinal plants, and fruits from 2000 to 2013. The clustering module displays clustering results of horticultural crops in the form of text and table on SpagoBI. This module can also visualize the distribution of horticultural crops in the form of map on the HTML page. The application is expected to be useful for users in order to easily obtain summaries of the horticultural crops distribution data and its clusters. The summaries and clusters can be beneficial for the stakeholders to determine potential areas in Indonesia for horticultural crops.
Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix

NASA Technical Reports Server (NTRS)

Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

2006-01-01

Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.
Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies

DOE PAGES

Campana, R.; Bernieri, E.; Massaro, E.; ...

2013-05-22

We present that the minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to γ-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular “tree” connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, canmore » be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitudeM of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that √M is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. Finally, we apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.« less
Effects of Combined Stellar Feedback on Star Formation in Stellar Clusters

NASA Astrophysics Data System (ADS)

Wall, Joshua Edward; McMillan, Stephen; Pellegrino, Andrew; Mac Low, Mordecai; Klessen, Ralf; Portegies Zwart, Simon

2018-01-01

We present results of hybrid MHD+N-body simulations of star cluster formation and evolution including self consistent feedback from the stars in the form of radiation, winds, and supernovae from all stars more massive than 7 solar masses. The MHD is modeled with the adaptive mesh refinement code FLASH, while the N-body computations are done with a direct algorithm. Radiation is modeled using ray tracing along long characteristics in directions distributed using the HEALPIX algorithm, and causes ionization and momentum deposition, while winds and supernova conserve momentum and energy during injection. Stellar evolution is followed using power-law fits to evolution models in SeBa. We use a gravity bridge within the AMUSE framework to couple the N-body dynamics of the stars to the gas dynamics in FLASH. Feedback from the massive stars alters the structure of young clusters as gas ejection occurs. We diagnose this behavior by distinguishing between fractal distribution and central clustering using a Q parameter computed from the minimum spanning tree of each model cluster. Global effects of feedback in our simulations will also be discussed.
Cooperative network clustering and task allocation for heterogeneous small satellite network

NASA Astrophysics Data System (ADS)

Qin, Jing

The research of small satellite has emerged as a hot topic in recent years because of its economical prospects and convenience in launching and design. Due to the size and energy constraints of small satellites, forming a small satellite network(SSN) in which all the satellites cooperate with each other to finish tasks is an efficient and effective way to utilize them. In this dissertation, I designed and evaluated a weight based dominating set clustering algorithm, which efficiently organizes the satellites into stable clusters. The traditional clustering algorithms of large monolithic satellite networks, such as formation flying and satellite swarm, are often limited on automatic formation of clusters. Therefore, a novel Distributed Weight based Dominating Set(DWDS) clustering algorithm is designed to address the clustering problems in the stochastically deployed SSNs. Considering the unique features of small satellites, this algorithm is able to form the clusters efficiently and stably. In this algorithm, satellites are separated into different groups according to their spatial characteristics. A minimum dominating set is chosen as the candidate cluster head set based on their weights, which is a weighted combination of residual energy and connection degree. Then the cluster heads admit new neighbors that accept their invitations into the cluster, until the maximum cluster size is reached. Evaluated by the simulation results, in a SSN with 200 to 800 nodes, the algorithm is able to efficiently cluster more than 90% of nodes in 3 seconds. The Deadline Based Resource Balancing (DBRB) task allocation algorithm is designed for efficient task allocations in heterogeneous LEO small satellite networks. In the task allocation process, the dispatcher needs to consider the deadlines of the tasks as well as the residue energy of different resources for best energy utilization. We assume the tasks adopt a Map-Reduce framework, in which a task can consist of multiple

The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

NASA Astrophysics Data System (ADS)

Grasha, K.; Calzetti, D.; Adamo, A.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Dale, D. A.; Fumagalli, M.; Grebel, E. K.; Johnson, K. E.; Kahre, L.; Kennicutt, R. C.; Messa, M.; Pellerin, A.; Ryon, J. E.; Smith, L. J.; Shabani, F.; Thilker, D.; Ubeda, L.

2017-05-01

We present a study of the hierarchical clustering of the young stellar clusters in six local (3-15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ˜40-60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.
Forecasting Jakarta composite index (IHSG) based on chen fuzzy time series and firefly clustering algorithm

NASA Astrophysics Data System (ADS)

Ningrum, R. W.; Surarso, B.; Farikhin; Safarudin, Y. M.

2018-03-01

This paper proposes the combination of Firefly Algorithm (FA) and Chen Fuzzy Time Series Forecasting. Most of the existing fuzzy forecasting methods based on fuzzy time series use the static length of intervals. Therefore, we apply an artificial intelligence, i.e., Firefly Algorithm (FA) to set non-stationary length of intervals for each cluster on Chen Method. The method is evaluated by applying on the Jakarta Composite Index (IHSG) and compare with classical Chen Fuzzy Time Series Forecasting. Its performance verified through simulation using Matlab.
A gossip based information fusion protocol for distributed frequent itemset mining

NASA Astrophysics Data System (ADS)

Sohrabi, Mohammad Karim

2018-07-01

The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.
Clustering XML Documents Using Frequent Subtrees

NASA Astrophysics Data System (ADS)

Kutty, Sangeetha; Tran, Tien; Nayak, Richi; Li, Yuefeng

This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.
Internal velocity and mass distributions in simulated clusters of galaxies for a variety of cosmogonic models

NASA Technical Reports Server (NTRS)

Cen, Renyue

1994-01-01

The mass and velocity distributions in the outskirts (0.5-3.0/h Mpc) of simulated clusters of galaxies are examined for a suite of cosmogonic models (two Omega(sub 0) = 1 and two Omega(sub 0) = 0.2 models) utilizing large-scale particle-mesh (PM) simulations. Through a series of model computations, designed to isolate the different effects, we find that both Omega(sub 0) and P(sub k) (lambda less than or = 16/h Mpc) are important to the mass distributions in clusters of galaxies. There is a correlation between power, P(sub k), and density profiles of massive clusters; more power tends to point to the direction of a stronger correlation between alpha and M(r less than 1.5/h Mpc); i.e., massive clusters being relatively extended and small mass clusters being relatively concentrated. A lower Omega(sub 0) universe tends to produce relatively concentrated massive clusters and relatively extended small mass clusters compared to their counterparts in a higher Omega(sub 0) model with the same power. Models with little (initial) small-scale power, such as the hot dark matter (HDM) model, produce more extended mass distributions than the isothermal distribution for most of the mass clusters. But the cold dark matter (CDM) models show mass distributions of most of the clusters more concentrated than the isothermal distribution. X-ray and gravitational lensing observations are beginning providing useful information on the mass distribution in and around clusters; some interesting constraints on Omega(sub 0) and/or the (initial) power of the density fluctuations on scales lambda less than or = 16/h Mpc (where linear extrapolation is invalid) can be obtained when larger observational data sets, such as the Sloan Digital Sky Survey, become available.
Collaborative Clustering for Sensor Networks

NASA Technical Reports Server (NTRS)

Wagstaff. Loro :/; Green Jillian; Lane, Terran

2011-01-01

Traditionally, nodes in a sensor network simply collect data and then pass it on to a centralized node that archives, distributes, and possibly analyzes the data. However, analysis at the individual nodes could enable faster detection of anomalies or other interesting events, as well as faster responses such as sending out alerts or increasing the data collection rate. There is an additional opportunity for increased performance if individual nodes can communicate directly with their neighbors. Previously, a method was developed by which machine learning classification algorithms could collaborate to achieve high performance autonomously (without requiring human intervention). This method worked for supervised learning algorithms, in which labeled data is used to train models. The learners collaborated by exchanging labels describing the data. The new advance enables clustering algorithms, which do not use labeled data, to also collaborate. This is achieved by defining a new language for collaboration that uses pair-wise constraints to encode useful information for other learners. These constraints specify that two items must, or cannot, be placed into the same cluster. Previous work has shown that clustering with these constraints (in isolation) already improves performance. In the problem formulation, each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. Each learner clusters its data and then selects a pair of items about which it is uncertain and uses them to query its neighbors. The resulting feedback (a must and cannot constraint from each neighbor) is combined by the learner into a consensus constraint, and it then reclusters its data while incorporating the new constraint. A strategy was also proposed for cleaning the resulting constraint sets, which may contain conflicting constraints; this improves performance significantly. This approach has been applied to collaborative
Illinois Occupational Skill Standards: Finishing and Distribution Cluster.

ERIC Educational Resources Information Center

Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the finishing and distribution cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards…
Distributions of Gas and Galaxies from Galaxy Clusters to Larger Scales

NASA Astrophysics Data System (ADS)

Patej, Anna

2017-01-01

We address the distributions of gas and galaxies on three scales: the outskirts of galaxy clusters, the clustering of galaxies on large scales, and the extremes of the galaxy distribution. In the outskirts of galaxy clusters, long-standing analytical models of structure formation and recent simulations predict the existence of density jumps in the gas and dark matter profiles. We use these features to derive models for the gas density profile, obtaining a simple fiducial model that is in agreement with both observations of cluster interiors and simulations of the outskirts. We next consider the galaxy density profiles of clusters; under the assumption that the galaxies in cluster outskirts follow similar collisionless dynamics as the dark matter, their distribution should show a steep jump as well. We examine the profiles of a low-redshift sample of clusters and groups, finding evidence for the jump in some of these clusters. Moving to larger scales where massive galaxies of different types are expected to trace the same large-scale structure, we present a test of this prediction by measuring the clustering of red and blue galaxies at z 0.6, finding low stochasticity between the two populations. These results address a key source of systematic uncertainty - understanding how target populations of galaxies trace large-scale structure - in galaxy redshift surveys. Such surveys use baryon acoustic oscillations (BAO) as a cosmological probe, but are limited by the expense of obtaining sufficiently dense spectroscopy. With the intention of leveraging upcoming deep imaging data, we develop a new method of detecting the BAO in sparse spectroscopic samples via cross-correlation with a dense photometric catalog. This method will permit the extension of BAO measurements to higher redshifts than possible with the existing spectroscopy alone. Lastly, we connect galaxies near and far: the Local Group dwarfs and the high redshift galaxies observed by Hubble and Spitzer. We
The Gas Distribution in the Outer Regions of Galaxy Clusters

NASA Technical Reports Server (NTRS)

Eckert, D.; Vazza, F.; Ettori, S.; Molendi, S.; Nagai, D.; Lau, E. T.; Roncarelli, M.; Rossetti, M.; Snowden, L.; Gastaldello, F.

2012-01-01

Aims. We present our analysis of a local (z = 0.04 - 0.2) sample of 31 galaxy clusters with the aim of measuring the density of the X-ray emitting gas in cluster outskirts. We compare our results with numerical simulations to set constraints on the azimuthal symmetry and gas clumping in the outer regions of galaxy clusters. Methods. We have exploited the large field-of-view and low instrumental background of ROSAT/PSPC to trace the density of the intracluster gas out to the virial radius, We stacked the density profiles to detect a signal beyond T200 and measured the typical density and scatter in cluster outskirts. We also computed the azimuthal scatter of the profiles with respect to the mean value to look for deviations from spherical symmetry. Finally, we compared our average density and scatter profiles with the results of numerical simulations. Results. As opposed to some recent Suzaku results, and confirming previous evidence from ROSAT and Chandra, we observe a steepening of the density profiles beyond approximately r(sub 500). Comparing our density profiles with simulations, we find that non-radiative runs predict density profiles that are too steep, whereas runs including additional physics and/ or treating gas clumping agree better with the observed gas distribution. We report high-confidence detection of a systematic difference between cool-core and non cool-core clusters beyond approximately 0.3r(sub 200), which we explain by a different distribution of the gas in the two classes. Beyond approximately r(sub 500), galaxy clusters deviate significantly from spherical symmetry, with only small differences between relaxed and disturbed systems. We find good agreement between the observed and predicted scatter profiles, but only when the 1% densest clumps are filtered out in the ENZO simulations. Conclusions. Comparing our results with numerical simulations, we find that non-radiative simulations fail to reproduce the gas distribution, even well outside
The Spatial Distribution of the Young Stellar Clusters in the Star-forming Galaxy NGC 628

NASA Astrophysics Data System (ADS)

Grasha, K.; Calzetti, D.; Adamo, A.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Aloisi, A.; Bright, S. N.; Christian, C.; Cignoni, M.; Dale, D. A.; Dobbs, C.; Elmegreen, D. M.; Fumagalli, M.; Gallagher, J. S., III; Grebel, E. K.; Johnson, K. E.; Lee, J. C.; Messa, M.; Smith, L. J.; Ryon, J. E.; Thilker, D.; Ubeda, L.; Wofford, A.

2015-12-01

We present a study of the spatial distribution of the stellar cluster populations in the star-forming galaxy NGC 628. Using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey), we have identified 1392 potential young (≲ 100 Myr) stellar clusters within the galaxy using a combination of visual inspection and automatic selection. We investigate the clustering of these young stellar clusters and quantify the strength and change of clustering strength with scale using the two-point correlation function. We also investigate how image boundary conditions and dust lanes affect the observed clustering. The distribution of the clusters is well fit by a broken power law with negative exponent α. We recover a weighted mean index of α ∼ -0.8 for all spatial scales below the break at 3.″3 (158 pc at a distance of 9.9 Mpc) and an index of α ∼ -0.18 above 158 pc for the accumulation of all cluster types. The strength of the clustering increases with decreasing age and clusters older than 40 Myr lose their clustered structure very rapidly and tend to be randomly distributed in this galaxy, whereas the mass of the star cluster has little effect on the clustering strength. This is consistent with results from other studies that the morphological hierarchy in stellar clustering resembles the same hierarchy as the turbulent interstellar medium.
Efficient clustering aggregation based on data fragments.

PubMed

Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

2012-06-01

Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets

NASA Astrophysics Data System (ADS)

Tonkova, V.; Paulus, D.; Neeb, H.

2013-02-01

We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
The Detection and Statistics of Giant Arcs behind CLASH Clusters

NASA Astrophysics Data System (ADS)

Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton

2016-02-01

We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.
A new clustering strategy

NASA Astrophysics Data System (ADS)

Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing

2007-04-01

On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
New algorithm and system for measuring size distribution of blood cells

NASA Astrophysics Data System (ADS)

Yao, Cuiping; Li, Zheng; Zhang, Zhenxi

2004-06-01

In optical scattering particle sizing, a numerical transform is sought so that a particle size distribution can be determined from angular measurements of near forward scattering, which has been adopted in the measurement of blood cells. In this paper a new method of counting and classification of blood cell, laser light scattering method from stationary suspensions, is presented. The genetic algorithm combined with nonnegative least squared algorithm is employed to inverse the size distribution of blood cells. Numerical tests show that these techniques can be successfully applied to measuring size distribution of blood cell with high stability.
The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grasha, K.; Calzetti, D.; Adamo, A.

We present a study of the hierarchical clustering of the young stellar clusters in six local (3–15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. Themore » strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ∼40–60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.« less
A review of estimation of distribution algorithms in bioinformatics

PubMed Central

Armañanzas, Rubén; Inza, Iñaki; Santana, Roberto; Saeys, Yvan; Flores, Jose Luis; Lozano, Jose Antonio; Peer, Yves Van de; Blanco, Rosa; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro

2008-01-01

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain. PMID:18822112
Multi-heuristic dynamic task allocation using genetic algorithms in a heterogeneous distributed system

PubMed Central

Page, Andrew J.; Keane, Thomas M.; Naughton, Thomas J.

2010-01-01

We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms. PMID:20862190
Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres.

PubMed

Banerjee, Arindam; Ghosh, Joydeep

2004-05-01

Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering.

PubMed

Luo, Junhai; Fu, Liang

2017-06-09

With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

A new distributed systems scheduling algorithm: a swarm intelligence approach

NASA Astrophysics Data System (ADS)

Haghi Kashani, Mostafa; Sarvizadeh, Raheleh; Jameii, Mahdi

2011-12-01

The scheduling problem in distributed systems is known as an NP-complete problem, and methods based on heuristic or metaheuristic search have been proposed to obtain optimal and suboptimal solutions. The task scheduling is a key factor for distributed systems to gain better performance. In this paper, an efficient method based on memetic algorithm is developed to solve the problem of distributed systems scheduling. With regard to load balancing efficiently, Artificial Bee Colony (ABC) has been applied as local search in the proposed memetic algorithm. The proposed method has been compared to existing memetic-Based approach in which Learning Automata method has been used as local search. The results demonstrated that the proposed method outperform the above mentioned method in terms of communication cost.
Clustering analysis of moving target signatures

NASA Astrophysics Data System (ADS)

Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

2010-04-01

Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Kinetic energy distribution of multiply charged ions in Coulomb explosion of Xe clusters.

PubMed

Heidenreich, Andreas; Jortner, Joshua

2011-02-21

We report on the calculations of kinetic energy distribution (KED) functions of multiply charged, high-energy ions in Coulomb explosion (CE) of an assembly of elemental Xe(n) clusters (average size (n) = 200-2171) driven by ultra-intense, near-infrared, Gaussian laser fields (peak intensities 10(15) - 4 × 10(16) W cm(-2), pulse lengths 65-230 fs). In this cluster size and pulse parameter domain, outer ionization is incomplete∕vertical, incomplete∕nonvertical, or complete∕nonvertical, with CE occurring in the presence of nanoplasma electrons. The KEDs were obtained from double averaging of single-trajectory molecular dynamics simulation ion kinetic energies. The KEDs were doubly averaged over a log-normal cluster size distribution and over the laser intensity distribution of a spatial Gaussian beam, which constitutes either a two-dimensional (2D) or a three-dimensional (3D) profile, with the 3D profile (when the cluster beam radius is larger than the Rayleigh length) usually being experimentally realized. The general features of the doubly averaged KEDs manifest the smearing out of the structure corresponding to the distribution of ion charges, a marked increase of the KEDs at very low energies due to the contribution from the persistent nanoplasma, a distortion of the KEDs and of the average energies toward lower energy values, and the appearance of long low-intensity high-energy tails caused by the admixture of contributions from large clusters by size averaging. The doubly averaged simulation results account reasonably well (within 30%) for the experimental data for the cluster-size dependence of the CE energetics and for its dependence on the laser pulse parameters, as well as for the anisotropy in the angular distribution of the energies of the Xe(q+) ions. Possible applications of this computational study include a control of the ion kinetic energies by the choice of the laser intensity profile (2D∕3D) in the laser-cluster interaction volume.
A fast elitism Gaussian estimation of distribution algorithm and application for PID optimization.

PubMed

Xu, Qingyang; Zhang, Chengjin; Zhang, Li

2014-01-01

Estimation of distribution algorithm (EDA) is an intelligent optimization algorithm based on the probability statistics theory. A fast elitism Gaussian estimation of distribution algorithm (FEGEDA) is proposed in this paper. The Gaussian probability model is used to model the solution distribution. The parameters of Gaussian come from the statistical information of the best individuals by fast learning rule. A fast learning rule is used to enhance the efficiency of the algorithm, and an elitism strategy is used to maintain the convergent performance. The performances of the algorithm are examined based upon several benchmarks. In the simulations, a one-dimensional benchmark is used to visualize the optimization process and probability model learning process during the evolution, and several two-dimensional and higher dimensional benchmarks are used to testify the performance of FEGEDA. The experimental results indicate the capability of FEGEDA, especially in the higher dimensional problems, and the FEGEDA exhibits a better performance than some other algorithms and EDAs. Finally, FEGEDA is used in PID controller optimization of PMSM and compared with the classical-PID and GA.
A Fast Elitism Gaussian Estimation of Distribution Algorithm and Application for PID Optimization

PubMed Central

Xu, Qingyang; Zhang, Chengjin; Zhang, Li

2014-01-01

Estimation of distribution algorithm (EDA) is an intelligent optimization algorithm based on the probability statistics theory. A fast elitism Gaussian estimation of distribution algorithm (FEGEDA) is proposed in this paper. The Gaussian probability model is used to model the solution distribution. The parameters of Gaussian come from the statistical information of the best individuals by fast learning rule. A fast learning rule is used to enhance the efficiency of the algorithm, and an elitism strategy is used to maintain the convergent performance. The performances of the algorithm are examined based upon several benchmarks. In the simulations, a one-dimensional benchmark is used to visualize the optimization process and probability model learning process during the evolution, and several two-dimensional and higher dimensional benchmarks are used to testify the performance of FEGEDA. The experimental results indicate the capability of FEGEDA, especially in the higher dimensional problems, and the FEGEDA exhibits a better performance than some other algorithms and EDAs. Finally, FEGEDA is used in PID controller optimization of PMSM and compared with the classical-PID and GA. PMID:24892059
A data distributed parallel algorithm for ray-traced volume rendering

NASA Technical Reports Server (NTRS)

Ma, Kwan-Liu; Painter, James S.; Hansen, Charles D.; Krogh, Michael F.

1993-01-01

This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local ray tracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on both the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
Pore-scale micro-computed-tomography imaging: Nonwetting-phase cluster-size distribution during drainage and imbibition

NASA Astrophysics Data System (ADS)

Georgiadis, A.; Berg, S.; Makurat, A.; Maitland, G.; Ott, H.

2013-09-01

We investigated the cluster-size distribution of the residual nonwetting phase in a sintered glass-bead porous medium at two-phase flow conditions, by means of micro-computed-tomography (μCT) imaging with pore-scale resolution. Cluster-size distribution functions and cluster volumes were obtained by image analysis for a range of injected pore volumes under both imbibition and drainage conditions; the field of view was larger than the porosity-based representative elementary volume (REV). We did not attempt to make a definition for a two-phase REV but used the nonwetting-phase cluster-size distribution as an indicator. Most of the nonwetting-phase total volume was found to be contained in clusters that were one to two orders of magnitude larger than the porosity-based REV. The largest observed clusters in fact ranged in volume from 65% to 99% of the entire nonwetting phase in the field of view. As a consequence, the largest clusters observed were statistically not represented and were found to be smaller than the estimated maximum cluster length. The results indicate that the two-phase REV is larger than the field of view attainable by μCT scanning, at a resolution which allows for the accurate determination of cluster connectivity.
Molecular Isotopic Distribution Analysis (MIDAs) with Adjustable Mass Accuracy

NASA Astrophysics Data System (ADS)

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy.

PubMed

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Geometry-driven distributed compression of the plenoptic function: performance bounds and constructive algorithms.

PubMed

Gehrig, Nicolas; Dragotti, Pier Luigi

2009-03-01

In this paper, we study the sampling and the distributed compression of the data acquired by a camera sensor network. The effective design of these sampling and compression schemes requires, however, the understanding of the structure of the acquired data. To this end, we show that the a priori knowledge of the configuration of the camera sensor network can lead to an effective estimation of such structure and to the design of effective distributed compression algorithms. For idealized scenarios, we derive the fundamental performance bounds of a camera sensor network and clarify the connection between sampling and distributed compression. We then present a distributed compression algorithm that takes advantage of the structure of the data and that outperforms independent compression algorithms on real multiview images.
Kanerva's sparse distributed memory: An associative memory algorithm well-suited to the Connection Machine

NASA Technical Reports Server (NTRS)

Rogers, David

1988-01-01

The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters

NASA Astrophysics Data System (ADS)

Mahmoud, E.; Shoukry, A.; Takey, A.

2018-04-01

Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values
Conformational Clusters of Phosphorylated Tyrosine.

PubMed

Abdelrasoul, Maha; Ponniah, Komala; Mao, Alice; Warden, Meghan S; Elhefnawy, Wessam; Li, Yaohang; Pascal, Steven M

2017-12-06

Tyrosine phosphorylation plays an important role in many cellular and intercellular processes including signal transduction, subcellular localization, and regulation of enzymatic activity. In 1999, Blom et al., using the limited number of protein data bank (PDB) structures available at that time, reported that the side chain structures of phosphorylated tyrosine (pY) are partitioned into two conserved conformational clusters ( Blom, N.; Gammeltoft, S.; Brunak, S. J. Mol. Biol. 1999 , 294 , 1351 - 1362 ). We have used the spectral clustering algorithm to cluster the increasingly growing number of protein structures with pY sites, and have found that the pY residues cluster into three distinct side chain conformations. Two of these pY conformational clusters associate strongly with a narrow range of tyrosine backbone conformation. The novel cluster also highly correlates with the identity of the n + 1 residue, and is strongly associated with a sequential pYpY conformation which places two adjacent pY side chains in a specific relative orientation. Further analysis shows that the three pY clusters are associated with distinct distributions of cognate protein kinases.
Distributive Education Competency-Based Curriculum Models by Occupational Clusters. Final Report.

ERIC Educational Resources Information Center

Davis, Rodney E.; Husted, Stewart W.

To meet the needs of distributive education teachers and students, a project was initiated to develop competency-based curriculum models for marketing and distributive education clusters. The models which were developed incorporate competencies, materials and resources, teaching methodologies/learning activities, and evaluative criteria for the…
Scaling Deep Learning on GPU and Knights Landing clusters

DOE PAGES

You, Yang; Buluc, Aydin; Demmel, James

2017-09-26

The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Scaling Deep Learning on GPU and Knights Landing clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Buluc, Aydin; Demmel, James

The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range.

PubMed

He, Chenlong; Feng, Zuren; Ren, Zhigang

2018-02-03

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham's Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm.
The GSAM software: A global search algorithm of minima exploration for the investigation of low lying isomers of clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude

2015-01-22

The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, anmore » optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.« less
Clustering analysis of water distribution systems: identifying critical components and community impacts.

PubMed

Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

2014-01-01

Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.
A Distributed Transmission Rate Adjustment Algorithm in Heterogeneous CSMA/CA Networks

PubMed Central

Xie, Shuanglong; Low, Kay Soon; Gunawan, Erry

2015-01-01

Distributed transmission rate tuning is important for a wide variety of IEEE 802.15.4 network applications such as industrial network control systems. Such systems often require each node to sustain certain throughput demand in order to guarantee the system performance. It is thus essential to determine a proper transmission rate that can meet the application requirement and compensate for network imperfections (e.g., packet loss). Such a tuning in a heterogeneous network is difficult due to the lack of modeling techniques that can deal with the heterogeneity of the network as well as the network traffic changes. In this paper, a distributed transmission rate tuning algorithm in a heterogeneous IEEE 802.15.4 CSMA/CA network is proposed. Each node uses the results of clear channel assessment (CCA) to estimate the busy channel probability. Then a mathematical framework is developed to estimate the on-going heterogeneous traffics using the busy channel probability at runtime. Finally a distributed algorithm is derived to tune the transmission rate of each node to accurately meet the throughput requirement. The algorithm does not require modifications on IEEE 802.15.4 MAC layer and it has been experimentally implemented and extensively tested using TelosB nodes with the TinyOS protocol stack. The results reveal that the algorithm is accurate and can satisfy the throughput demand. Compared with existing techniques, the algorithm is fully distributed and thus does not require any central coordination. With this property, it is able to adapt to traffic changes and re-adjust the transmission rate to the desired level, which cannot be achieved using the traditional modeling techniques. PMID:25822140

Energy Efficient and Stable Weight Based Clustering for Mobile Ad Hoc Networks

NASA Astrophysics Data System (ADS)

Bouk, Safdar H.; Sasase, Iwao

Recently several weighted clustering algorithms have been proposed, however, to the best of our knowledge; there is none that propagates weights to other nodes without weight message for leader election, normalizes node parameters and considers neighboring node parameters to calculate node weights. In this paper, we propose an Energy Efficient and Stable Weight Based Clustering (EE-SWBC) algorithm that elects cluster heads without sending any additional weight message. It propagates node parameters to its neighbors through neighbor discovery message (HELLO Message) and stores these parameters in neighborhood list. Each node normalizes parameters and efficiently calculates its own weight and the weights of neighboring nodes from that neighborhood table using Grey Decision Method (GDM). GDM finds the ideal solution (best node parameters in neighborhood list) and calculates node weights in comparison to the ideal solution. The node(s) with maximum weight (parameters closer to the ideal solution) are elected as cluster heads. In result, EE-SWBC fairly selects potential nodes with parameters closer to ideal solution with less overhead. Different performance metrics of EE-SWBC and Distributed Weighted Clustering Algorithm (DWCA) are compared through simulations. The simulation results show that EE-SWBC maintains fewer average numbers of stable clusters with minimum overhead, less energy consumption and fewer changes in cluster structure within network compared to DWCA.
SOTXTSTREAM: Density-based self-organizing clustering of text streams.

PubMed

Bryant, Avory C; Cios, Krzysztof J

2017-01-01

A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

PubMed Central

DeMaere, Matthew Z.

2016-01-01

Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713
Mitigate the impact of transmitter finite extinction ratio using K-means clustering algorithm for 16QAM signal

NASA Astrophysics Data System (ADS)

Yu, Miao; Li, Yan; Shu, Tong; Zhang, Yifan; Hong, Xiaobin; Qiu, Jifang; Zuo, Yong; Guo, Hongxiang; Li, Wei; Wu, Jian

2018-02-01

A method of recognizing 16QAM signal based on k-means clustering algorithm is proposed to mitigate the impact of transmitter finite extinction ratio. There are pilot symbols with 0.39% overhead assigned to be regarded as initial centroids of k-means clustering algorithm. Simulation result in 10 GBaud 16QAM system shows that the proposed method obtains higher precision of identification compared with traditional decision method for finite ER and IQ mismatch. Specially, the proposed method improves the required OSNR by 5.5 dB, 4.5 dB, 4 dB and 3 dB at FEC limit with ER= 12 dB, 16 dB, 20 dB and 24 dB, respectively, and the acceptable bias error and IQ mismatch range is widened by 767% and 360% with ER =16 dB, respectively.
Inverse estimation of the spheroidal particle size distribution using Ant Colony Optimization algorithms in multispectral extinction technique

NASA Astrophysics Data System (ADS)

He, Zhenzong; Qi, Hong; Wang, Yuqing; Ruan, Liming

2014-10-01

Four improved Ant Colony Optimization (ACO) algorithms, i.e. the probability density function based ACO (PDF-ACO) algorithm, the Region ACO (RACO) algorithm, Stochastic ACO (SACO) algorithm and Homogeneous ACO (HACO) algorithm, are employed to estimate the particle size distribution (PSD) of the spheroidal particles. The direct problems are solved by the extended Anomalous Diffraction Approximation (ADA) and the Lambert-Beer law. Three commonly used monomodal distribution functions i.e. the Rosin-Rammer (R-R) distribution function, the normal (N-N) distribution function, and the logarithmic normal (L-N) distribution function are estimated under dependent model. The influence of random measurement errors on the inverse results is also investigated. All the results reveal that the PDF-ACO algorithm is more accurate than the other three ACO algorithms and can be used as an effective technique to investigate the PSD of the spheroidal particles. Furthermore, the Johnson's SB (J-SB) function and the modified beta (M-β) function are employed as the general distribution functions to retrieve the PSD of spheroidal particles using PDF-ACO algorithm. The investigation shows a reasonable agreement between the original distribution function and the general distribution function when only considering the variety of the length of the rotational semi-axis.
Cluster-cluster clustering

NASA Technical Reports Server (NTRS)

Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

1985-01-01

The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
Data-driven process decomposition and robust online distributed modelling for large-scale processes

NASA Astrophysics Data System (ADS)

Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

2018-02-01

With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Bingxiao; Zheng, Wei; Postman, Marc

We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ andmore » length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z{sub s} = 1.9 with 33% of the detected arcs having z{sub s} > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.« less
Improved Cost-Base Design of Water Distribution Networks using Genetic Algorithm

NASA Astrophysics Data System (ADS)

Moradzadeh Azar, Foad; Abghari, Hirad; Taghi Alami, Mohammad; Weijs, Steven

2010-05-01

Population growth and progressive extension of urbanization in different places of Iran cause an increasing demand for primary needs. The water, this vital liquid is the most important natural need for human life. Providing this natural need is requires the design and construction of water distribution networks, that incur enormous costs on the country's budget. Any reduction in these costs enable more people from society to access extreme profit least cost. Therefore, investment of Municipal councils need to maximize benefits or minimize expenditures. To achieve this purpose, the engineering design depends on the cost optimization techniques. This paper, presents optimization models based on genetic algorithm(GA) to find out the minimum design cost Mahabad City's (North West, Iran) water distribution network. By designing two models and comparing the resulting costs, the abilities of GA were determined. the GA based model could find optimum pipe diameters to reduce the design costs of network. Results show that the water distribution network design using Genetic Algorithm could lead to reduction of at least 7% in project costs in comparison to the classic model. Keywords: Genetic Algorithm, Optimum Design of Water Distribution Network, Mahabad City, Iran.
Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

PubMed

Mwangi, Benson; Soares, Jair C; Hasan, Khader M

2014-10-30

Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
An evaluation of ISOCLS and CLASSY clustering algorithms for forest classification in northern Idaho. [Elk River quadrange of the Clearwater National Forest

NASA Technical Reports Server (NTRS)

Werth, L. F. (Principal Investigator)

1981-01-01

Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.
The uniformity and time-invariance of the intra-cluster metal distribution in galaxy clusters from the IllustrisTNG simulations

NASA Astrophysics Data System (ADS)

Vogelsberger, Mark; Marinacci, Federico; Torrey, Paul; Genel, Shy; Springel, Volker; Weinberger, Rainer; Pakmor, Rüdiger; Hernquist, Lars; Naiman, Jill; Pillepich, Annalisa; Nelson, Dylan

2018-02-01

The distribution of metals in the intra-cluster medium (ICM) encodes important information about the enrichment history and formation of galaxy clusters. Here, we explore the metal content of clusters in IllustrisTNG - a new suite of galaxy formation simulations building on the Illustris project. Our cluster sample contains 20 objects in TNG100 - a ˜(100 Mpc)3 volume simulation with 2 × 18203 resolution elements, and 370 objects in TNG300 - a ˜(300 Mpc)3 volume simulation with 2 × 25003 resolution elements. The z = 0 metallicity profiles agree with observations, and the enrichment history is consistent with observational data going beyond z ˜ 1, showing nearly no metallicity evolution. The abundance profiles vary only minimally within the cluster samples, especially in the outskirts with a relative scatter of ˜ 15 per cent. The average metallicity profile flattens towards the centre, where we find a logarithmic slope of -0.1 compared to -0.5 in the outskirts. Cool core clusters have more centrally peaked metallicity profiles (˜0.8 solar) compared to non-cool core systems (˜0.5 solar), similar to observational trends. Si/Fe and O/Fe radial profiles follow positive gradients. The outer abundance profiles do not evolve below z ˜ 2, whereas the inner profiles flatten towards z = 0. More than ˜ 80 per cent of the metals in the ICM have been accreted from the proto-cluster environment, which has been enriched to ˜0.1 solar already at z ˜ 2. We conclude that the intra-cluster metal distribution is uniform among our cluster sample, nearly time-invariant in the outskirts for more than 10 Gyr, and forms through a universal enrichment history.
Disease clusters, exact distributions of maxima, and P-values.

PubMed

Grimson, R C

1993-10-01

This paper presents combinatorial (exact) methods that are useful in the analysis of disease cluster data obtained from small environments, such as buildings and neighbourhoods. Maxwell-Boltzmann and Fermi-Dirac occupancy models are compared in terms of appropriateness of representation of disease incidence patterns (space and/or time) in these environments. The methods are illustrated by a statistical analysis of the incidence pattern of bone fractures in a setting wherein fracture clustering was alleged to be occurring. One of the methodological results derived in this paper is the exact distribution of the maximum cell frequency in occupancy models.
Competitive repetition suppression (CoRe) clustering: a biologically inspired learning model with application to robust clustering.

PubMed

Bacciu, Davide; Starita, Antonina

2008-11-01

Determining a compact neural coding for a set of input stimuli is an issue that encompasses several biological memory mechanisms as well as various artificial neural network models. In particular, establishing the optimal network structure is still an open problem when dealing with unsupervised learning models. In this paper, we introduce a novel learning algorithm, named competitive repetition-suppression (CoRe) learning, inspired by a cortical memory mechanism called repetition suppression (RS). We show how such a mechanism is used, at various levels of the cerebral cortex, to generate compact neural representations of the visual stimuli. From the general CoRe learning model, we derive a clustering algorithm, named CoRe clustering, that can automatically estimate the unknown cluster number from the data without using a priori information concerning the input distribution. We illustrate how CoRe clustering, besides its biological plausibility, posses strong theoretical properties in terms of robustness to noise and outliers, and we provide an error function describing CoRe learning dynamics. Such a description is used to analyze CoRe relationships with the state-of-the art clustering models and to highlight CoRe similitude with rival penalized competitive learning (RPCL), showing how CoRe extends such a model by strengthening the rival penalization estimation by means of loss functions from robust statistics.
The implementation of hybrid clustering using fuzzy c-means and divisive algorithm for analyzing DNA human Papillomavirus cause of cervical cancer

NASA Astrophysics Data System (ADS)

Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.
A Comparison of Alternative Distributed Dynamic Cluster Formation Techniques for Industrial Wireless Sensor Networks.

PubMed

Gholami, Mohammad; Brennan, Robert W

2016-01-06

In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
A Comparison of Alternative Distributed Dynamic Cluster Formation Techniques for Industrial Wireless Sensor Networks

PubMed Central

Gholami, Mohammad; Brennan, Robert W.

2016-01-01

In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency. PMID:26751447
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range

PubMed Central

Feng, Zuren; Ren, Zhigang

2018-01-01

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham’s Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm. PMID:29401649
Evaluation of a Delay-Doppler Imaging Algorithm Based on the Wigner-Ville Distribution

DTIC Science & Technology

1989-10-18

exchanging the frequency and time variables. 2.3 PROPERTIES OF THE WIGNER - VILLE DISTRIBUTION A partial list of the properties of the WVD is provided...ESD-TH-89-163 N Technical Report (N R55 00 Lfl Evaluation of a Delay-Doppler Imaging Algorithm Based on the Wigner - Ville Distribution K.I. Schultz 18...DOPPLER IMAGING ALGORITHM BASED ON THE WIGNER - VILLE DISTRIBUTION K.I. SCHULTZ Group 52 TECHNICAL REPORT 855 18 OCTOBER 1989 Approved for public release
Hierarchical Dirichlet process model for gene expression clustering

PubMed Central

2013-01-01

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447

Distribution of histaminergic neuronal cluster in the rat and mouse hypothalamus.

PubMed

Moriwaki, Chinatsu; Chiba, Seiichi; Wei, Huixing; Aosa, Taishi; Kitamura, Hirokazu; Ina, Keisuke; Shibata, Hirotaka; Fujikura, Yoshihisa

2015-10-01

Histidine decarboxylase (HDC) catalyzes the biosynthesis of histamine from L-histidine and is expressed throughout the mammalian nervous system by histaminergic neurons. Histaminergic neurons arise in the posterior mesencephalon during the early embryonic period and gradually develop into two histaminergic substreams around the lateral area of the posterior hypothalamus and the more anterior peri-cerebral aqueduct area before finally forming an adult-like pattern comprising five neuronal clusters, E1, E2, E3, E4, and E5, at the postnatal stage. This distribution of histaminergic neuronal clusters in the rat hypothalamus appears to be a consequence of neuronal development and reflects the functional differentiation within each neuronal cluster. However, the close linkage between the locations of histaminergic neuronal clusters and their physiological functions has yet to be fully elucidated because of the sparse information regarding the location and orientation of each histaminergic neuronal clusters in the hypothalamus of rats and mice. To clarify the distribution of the five-histaminergic neuronal clusters more clearly, we performed an immunohistochemical study using the anti-HDC antibody on serial sections of the rat hypothalamus according to the brain maps of rat and mouse. Our results confirmed that the HDC-immunoreactive (HDCi) neuronal clusters in the hypothalamus of rats and mice are observed in the ventrolateral part of the most posterior hypothalamus (E1), ventrolateral part of the posterior hypothalamus (E2), ventromedial part from the medial to the posterior hypothalamus (E3), periventricular part from the anterior to the medial hypothalamus (E4), and diffusely extended part of the more dorsal and almost entire hypothalamus (E5). The stereological estimation of the total number of HDCi neurons of each clusters revealed the larger amount of the rat than the mouse. The characterization of histaminergic neuronal clusters in the hypothalamus of rats and
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

PubMed

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

2013-01-01

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real
Scaling and clustering effects of extreme precipitation distributions

NASA Astrophysics Data System (ADS)

Zhang, Qiang; Zhou, Yu; Singh, Vijay P.; Li, Jianfeng

2012-08-01

SummaryOne of the impacts of climate change and human activities on the hydrological cycle is the change in the precipitation structure. Closely related to the precipitation structure are two characteristics: the volume (m) of wet periods (WPs) and the time interval between WPs or waiting time (t). Using daily precipitation data for a period of 1960-2005 from 590 rain gauge stations in China, these two characteristics are analyzed, involving scaling and clustering of precipitation episodes. Our findings indicate that m and t follow similar probability distribution curves, implying that precipitation processes are controlled by similar underlying thermo-dynamics. Analysis of conditional probability distributions shows a significant dependence of m and t on their previous values of similar volumes, and the dependence tends to be stronger when m is larger or t is longer. It indicates that a higher probability can be expected when high-intensity precipitation is followed by precipitation episodes with similar precipitation intensity and longer waiting time between WPs is followed by the waiting time of similar duration. This result indicates the clustering of extreme precipitation episodes and severe droughts or floods are apt to occur in groups.
Autonomous distributed self-organization for mobile wireless sensor networks.

PubMed

Wen, Chih-Yu; Tang, Hung-Kai

2009-01-01

This paper presents an adaptive combined-metrics-based clustering scheme for mobile wireless sensor networks, which manages the mobile sensors by utilizing the hierarchical network structure and allocates network resources efficiently A local criteria is used to help mobile sensors form a new cluster or join a current cluster. The messages transmitted during hierarchical clustering are applied to choose distributed gateways such that communication for adjacent clusters and distributed topology control can be achieved. In order to balance the load among clusters and govern the topology change, a cluster reformation scheme using localized criterions is implemented. The proposed scheme is simulated and analyzed to abstract the network behaviors in a number of settings. The experimental results show that the proposed algorithm provides efficient network topology management and achieves high scalability in mobile sensor networks.
An image segmentation method based on fuzzy C-means clustering and Cuckoo search algorithm

NASA Astrophysics Data System (ADS)

Wang, Mingwei; Wan, Youchuan; Gao, Xianjun; Ye, Zhiwei; Chen, Maolin

2018-04-01

Image segmentation is a significant step in image analysis and machine vision. Many approaches have been presented in this topic; among them, fuzzy C-means (FCM) clustering is one of the most widely used methods for its high efficiency and ambiguity of images. However, the success of FCM could not be guaranteed because it easily traps into local optimal solution. Cuckoo search (CS) is a novel evolutionary algorithm, which has been tested on some optimization problems and proved to be high-efficiency. Therefore, a new segmentation technique using FCM and blending of CS algorithm is put forward in the paper. Further, the proposed method has been measured on several images and compared with other existing FCM techniques such as genetic algorithm (GA) based FCM and particle swarm optimization (PSO) based FCM in terms of fitness value. Experimental results indicate that the proposed method is robust, adaptive and exhibits the better performance than other methods involved in the paper.
Fast distributed large-pixel-count hologram computation using a GPU cluster.

PubMed

Pan, Yuechao; Xu, Xuewu; Liang, Xinan

2013-09-10

Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.
Design and implementation of streaming media server cluster based on FFMpeg.

PubMed

Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao

2015-01-01

Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system.
Design and Implementation of Streaming Media Server Cluster Based on FFMpeg

PubMed Central

Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao

2015-01-01

Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system. PMID:25734187
The mass-ratio and eccentricity distributions of barium and S stars, and red giants in open clusters

NASA Astrophysics Data System (ADS)

Van der Swaelmen, M.; Boffin, H. M. J.; Jorissen, A.; Van Eck, S.

2017-01-01

Context. A complete set of orbital parameters for barium stars, including the longest orbits, has recently been obtained thanks to a radial-velocity monitoring with the HERMES spectrograph installed on the Flemish Mercator telescope. Barium stars are supposed to belong to post-mass-transfer systems. Aims: In order to identify diagnostics distinguishing between pre- and post-mass-transfer systems, the properties of barium stars (more precisely their mass-function distribution and their period-eccentricity (P-e) diagram) are compared to those of binary red giants in open clusters. As a side product, we aim to identify possible post-mass-transfer systems among the cluster giants from the presence of s-process overabundances. We investigate the relation between the s-process enrichment, the location in the (P-e) diagram, and the cluster metallicity and turn-off mass. Methods: To invert the mass-function distribution and derive the mass-ratio distribution, we used the method pioneered by Boffin et al. (1992) that relies on a Richardson-Lucy deconvolution algorithm. The derivation of s-process abundances in the open-cluster giants was performed through spectral synthesis with MARCS model atmospheres. Results: A fraction of 22% of post-mass-transfer systems is found among the cluster binary giants (with companion masses between 0.58 and 0.87 M⊙, typical for white dwarfs), and these systems occupy a wider area than barium stars in the (P-e) diagram. Barium stars have on average lower eccentricities at a given orbital period. When the sample of binary giant stars in clusters is restricted to the subsample of systems occupying the same locus as the barium stars in the (P-e) diagram, and with a mass function compatible with a WD companion, 33% (=4/12) show a chemical signature of mass transfer in the form of s-process overabundances (from rather moderate - about 0.3 dex - to more extreme - about 1 dex). The only strong barium star in our sample is found in the cluster with
Hausdorff clustering

NASA Astrophysics Data System (ADS)

Basalto, Nicolas; Bellotti, Roberto; de Carlo, Francesco; Facchi, Paolo; Pantaleo, Ester; Pascazio, Saverio

2008-10-01

A clustering algorithm based on the Hausdorff distance is analyzed and compared to the single, complete, and average linkage algorithms. The four clustering procedures are applied to a toy example and to the time series of financial data. The dendrograms are scrutinized and their features compared. The Hausdorff linkage relies on firm mathematical grounds and turns out to be very effective when one has to discriminate among complex structures.
Clustering PPI data by combining FA and SHC method.

PubMed

Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin

2015-01-01

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method

PubMed Central

2015-01-01

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Waiting-time distributions of magnetic discontinuities: clustering or Poisson process?

PubMed

Greco, A; Matthaeus, W H; Servidio, S; Dmitruk, P

2009-10-01

Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
Waiting-time distributions of magnetic discontinuities: Clustering or Poisson process?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Greco, A.; Matthaeus, W. H.; Servidio, S.

2009-10-15

Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
Speedup of minimum discontinuity phase unwrapping algorithm with a reference phase distribution

NASA Astrophysics Data System (ADS)

Liu, Yihang; Han, Yu; Li, Fengjiao; Zhang, Qican

2018-06-01

In three-dimensional (3D) shape measurement based on phase analysis, the phase analysis process usually produces a wrapped phase map ranging from - π to π with some 2 π discontinuities, and thus a phase unwrapping algorithm is necessary to recover the continuous and nature phase map from which 3D height distribution can be restored. Usually, the minimum discontinuity phase unwrapping algorithm can be used to solve many different kinds of phase unwrapping problems, but its main drawback is that it requires a large amount of computations and has low efficiency in searching for the improving loop within the phase's discontinuity area. To overcome this drawback, an improvement to speedup of the minimum discontinuity phase unwrapping algorithm by using the phase distribution on reference plane is proposed. In this improved algorithm, before the minimum discontinuity phase unwrapping algorithm is carried out to unwrap phase, an integer number K was calculated from the ratio of the wrapped phase to the nature phase on a reference plane. And then the jump counts of the unwrapped phase can be reduced by adding 2K π, so the efficiency of the minimum discontinuity phase unwrapping algorithm is significantly improved. Both simulated and experimental data results verify the feasibility of the proposed improved algorithm, and both of them clearly show that the algorithm works very well and has high efficiency.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

PubMed Central

Wernisch, Lorenz

2017-01-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

PubMed

Gabasova, Evelina; Reid, John; Wernisch, Lorenz

2017-10-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
Plasma flow around and charge distribution of a dust cluster in a rf discharge

NASA Astrophysics Data System (ADS)

Schleede, J.; Lewerentz, L.; Bronold, F. X.; Schneider, R.; Fehske, H.

2018-04-01

We employ a particle-in-cell Monte Carlo collision/particle-particle particle-mesh simulation to study the plasma flow around and the charge distribution of a three-dimensional dust cluster in the sheath of a low-pressure rf argon discharge. The geometry of the cluster and its position in the sheath are fixed to the experimental values, prohibiting a mechanical response of the cluster. Electrically, however, the cluster and the plasma environment, mimicking also the experimental situation, are coupled self-consistently. We find a broad distribution of the charges collected by the grains. The ion flux shows on the scale of the Debye length strong focusing and shadowing inside and outside the cluster due to the attraction of the ions to the negatively charged grains, whereas the electron flux is characterized on this scale only by a weak spatial modulation of its magnitude depending on the rf phase. On the scale of the individual dust potentials, however, the electron flux deviates in the vicinity of the cluster strongly from the laminar flow associated with the plasma sheath. It develops convection patterns to compensate for the depletion of electrons inside the dust cluster.
A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

DOE PAGES

Azad, Ariful; Buluç, Aydın

2016-05-16

We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
EDMC: An enhanced distributed multi-channel anti-collision algorithm for RFID reader system

NASA Astrophysics Data System (ADS)

Zhang, YuJing; Cui, Yinghua

2017-05-01

In this paper, we proposes an enhanced distributed multi-channel reader anti-collision algorithm for RFID environments which is based on the distributed multi-channel reader anti-collision algorithm for RFID environments (called DiMCA). We proposes a monitor method to decide whether reader receive the latest control news after it selected the data channel. The simulation result shows that it improves interrogation delay.

Planck intermediate results. XLIII. Spectral energy distribution of dust in clusters of galaxies

NASA Astrophysics Data System (ADS)

Planck Collaboration; Adam, R.; Ade, P. A. R.; Aghanim, N.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bersanelli, M.; Bielewicz, P.; Bikmaev, I.; Bonaldi, A.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burenin, R.; Burigana, C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Chiang, H. C.; Christensen, P. R.; Churazov, E.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Elsner, F.; Enßlin, T. A.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Galeotta, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Harrison, D. L.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Hornstrup, A.; Hovest, W.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Khamitov, I.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Macías-Pérez, J. F.; Maffei, B.; Maggio, G.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Melchiorri, A.; Mennella, A.; Migliaccio, M.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Nørgaard-Nielsen, H. U.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Perdereau, O.; Perotto, L.; Pettorino, V.; Piacentini, F.; Piat, M.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Pratt, G. W.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Valenziano, L.; Valiviita, J.; Van Tent, F.; Vielva, P.; Villa, F.; Wade, L. A.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zonca, A.

2016-12-01

Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objects from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck's resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. The recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.
Planck intermediate results: XLIII. Spectral energy distribution of dust in clusters of galaxies

DOE PAGES

Adam, R.; Ade, P. A. R.; Aghanim, N.; ...

2016-12-12

Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide in this paper new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objectsmore » from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck’s resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. Finally, the recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.« less
Application of k-means clustering algorithm in grouping the DNA sequences of hepatitis B virus (HBV)

NASA Astrophysics Data System (ADS)

Bustamam, A.; Tasman, H.; Yuniarti, N.; Frisca, Mursidah, I.

2017-07-01

Based on WHO data, an estimated of 15 millions people worldwide who are infected with hepatitis B (HBsAg+), which is caused by HBV virus, are also infected by hepatitis D, which is caused by HDV virus. Hepatitis D infection can occur simultaneously with hepatitis B (co infection) or after a person is exposed to chronic hepatitis B (super infection). Since HDV cannot live without HBV, HDV infection is closely related to HBV infection, hence it is very realistic that every effort of prevention against hepatitis B can indirectly prevent hepatitis D. This paper presents clustering of HBV DNA sequences by using k-means clustering algorithm and R programming. Clustering processes are started with collecting HBV DNA sequences from GenBank, then performing extraction HBV DNA sequences using n-mers frequency and furthermore the extraction results are collected as a matrix and normalized using the min-max normalization with interval [0, 1] which will later be used as an input data. The number of clusters is two and the initial centroid selected of the cluster is chosen randomly. In each iteration, the distance of every object to each centroid are calculated using the Euclidean distance and the minimum distance is selected to determine the membership in a cluster until two convergent clusters are created. As the result, the HBV viruses in the first cluster is more virulent than the HBV viruses in the second cluster, so the HBV viruses in the first cluster can potentially evolve with HDV viruses that cause hepatitis D.
A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks

PubMed Central

Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

2016-01-01

The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380
A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

PubMed

Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

2016-10-13

The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.
Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

PubMed

González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

2017-02-01

The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Statistical Evaluation of Voltage Variation of Power Distribution System with Clustered Home-Cogeneration Systems

NASA Astrophysics Data System (ADS)

Kato, Takeyoshi; Minagata, Atsushi; Suzuoki, Yasuo

This paper discusses the influence of mass installation of a home co-generation system (H-CGS) using a polymer electrolyte fuel cell (PEFC) on the voltage profile of power distribution system in residential area. The influence of H-CGS is compared with that of photovoltaic power generation systems (PV systems). The operation pattern of H-CGS is assumed based on the electricity and hot-water demand observed in 10 households for a year. The main results are as follows. With the clustered H-CGS, the voltage of each bus is higher by about 1-3% compared with the conventional system without any distributed generators. Because H-CGS tends to increase the output during the early evening, H-CGS contributes to recover the voltage drop during the early evening, resulting in smaller voltage variation of distribution system throughout a day. Because of small rated power output about 1kW, the influence on voltage profile by the clustered H-CGS is smaller than that by the clustered PV systems. The highest voltage during the day time is not so high as compared with the distribution system with the clustered PV systems, even if the reverse power flow from H-CGS is allowed.
Reconstruction of the mass distribution of galaxy clusters from the inversion of the thermal Sunyaev-Zel'dovich effect

NASA Astrophysics Data System (ADS)

Majer, C. L.; Meyer, S.; Konrad, S.; Sarli, E.; Bartelmann, M.

2016-07-01

This paper continues a series in which we intend to show how all observables of galaxy clusters can be combined to recover the two-dimensional, projected gravitational potential of individual clusters. Our goal is to develop a non-parametric algorithm for joint cluster reconstruction taking all cluster observables into account. For this reason we focus on the line-of-sight projected gravitational potential, proportional to the lensing potential, in order to extend existing reconstruction algorithms. In this paper, we begin with the relation between the Compton-y parameter and the Newtonian gravitational potential, assuming hydrostatic equilibrium and a polytropic stratification of the intracluster gas. Extending our first publication we now consider a spheroidal rather than a spherical cluster symmetry. We show how a Richardson-Lucy deconvolution can be used to convert the intensity change of the CMB due to the thermal Sunyaev-Zel'dovich effect into an estimate for the two-dimensional gravitational potential. We apply our reconstruction method to a cluster based on an N-body/hydrodynamical simulation processed with the characteristics (resolution and noise) of the ALMA interferometer for which we achieve a relative error of ≲20 per cent for a large fraction of the virial radius. We further apply our method to an observation of the galaxy cluster RXJ1347 for which we can reconstruct the potential with a relative error of ≲20 per cent for the observable cluster range.
Precipitation Cluster Distributions: Current Climate Storm Statistics and Projected Changes Under Global Warming

NASA Astrophysics Data System (ADS)

Quinn, Kevin Martin

The total amount of precipitation integrated across a precipitation cluster (contiguous precipitating grid cells exceeding a minimum rain rate) is a useful measure of the aggregate size of the disturbance, expressed as the rate of water mass lost or latent heat released, i.e. the power of the disturbance. Probability distributions of cluster power are examined during boreal summer (May-September) and winter (January-March) using satellite-retrieved rain rates from the Tropical Rainfall Measuring Mission (TRMM) 3B42 and Special Sensor Microwave Imager and Sounder (SSM/I and SSMIS) programs, model output from the High Resolution Atmospheric Model (HIRAM, roughly 0.25-0.5 0 resolution), seven 1-2° resolution members of the Coupled Model Intercomparison Project Phase 5 (CMIP5) experiment, and National Center for Atmospheric Research Large Ensemble (NCAR LENS). Spatial distributions of precipitation-weighted centroids are also investigated in observations (TRMM-3B42) and climate models during winter as a metric for changes in mid-latitude storm tracks. Observed probability distributions for both seasons are scale-free from the smallest clusters up to a cutoff scale at high cluster power, after which the probability density drops rapidly. When low rain rates are excluded by choosing a minimum rain rate threshold in defining clusters, the models accurately reproduce observed cluster power statistics and winter storm tracks. Changes in behavior in the tail of the distribution, above the cutoff, are important for impacts since these quantify the frequency of the most powerful storms. End-of-century cluster power distributions and storm track locations are investigated in these models under a "business as usual" global warming scenario. The probability of high cluster power events increases by end-of-century across all models, by up to an order of magnitude for the highest-power events for which statistics can be computed. For the three models in the suite with continuous
An Efficient Distributed Compressed Sensing Algorithm for Decentralized Sensor Network.

PubMed

Liu, Jing; Huang, Kaiyu; Zhang, Guoxian

2017-04-20

We consider the joint sparsity Model 1 (JSM-1) in a decentralized scenario, where a number of sensors are connected through a network and there is no fusion center. A novel algorithm, named distributed compact sensing matrix pursuit (DCSMP), is proposed to exploit the computational and communication capabilities of the sensor nodes. In contrast to the conventional distributed compressed sensing algorithms adopting a random sensing matrix, the proposed algorithm focuses on the deterministic sensing matrices built directly on the real acquisition systems. The proposed DCSMP algorithm can be divided into two independent parts, the common and innovation support set estimation processes. The goal of the common support set estimation process is to obtain an estimated common support set by fusing the candidate support set information from an individual node and its neighboring nodes. In the following innovation support set estimation process, the measurement vector is projected into a subspace that is perpendicular to the subspace spanned by the columns indexed by the estimated common support set, to remove the impact of the estimated common support set. We can then search the innovation support set using an orthogonal matching pursuit (OMP) algorithm based on the projected measurement vector and projected sensing matrix. In the proposed DCSMP algorithm, the process of estimating the common component/support set is decoupled with that of estimating the innovation component/support set. Thus, the inaccurately estimated common support set will have no impact on estimating the innovation support set. It is proven that under the condition the estimated common support set contains the true common support set, the proposed algorithm can find the true innovation set correctly. Moreover, since the innovation support set estimation process is independent of the common support set estimation process, there is no requirement for the cardinality of both sets; thus, the proposed DCSMP
Three-dimensional cluster formation and structure in heterogeneous dose distribution of intensity modulated radiation therapy.

PubMed

Chao, Ming; Wei, Jie; Narayanasamy, Ganesh; Yuan, Yading; Lo, Yeh-Chi; Peñagarícano, José A

2018-05-01

To investigate three-dimensional cluster structure and its correlation to clinical endpoint in heterogeneous dose distributions from intensity modulated radiation therapy. Twenty-five clinical plans from twenty-one head and neck (HN) patients were used for a phenomenological study of the cluster structure formed from the dose distributions of organs at risks (OARs) close to the planning target volumes (PTVs). Initially, OAR clusters were searched to examine the pattern consistence among ten HN patients and five clinically similar plans from another HN patient. Second, clusters of the esophagus from another ten HN patients were scrutinized to correlate their sizes to radiobiological parameters. Finally, an extensive Monte Carlo (MC) procedure was implemented to gain deeper insights into the behavioral properties of the cluster formation. Clinical studies showed that OAR clusters had drastic differences despite similar PTV coverage among different patients, and the radiobiological parameters failed to positively correlate with the cluster sizes. MC study demonstrated the inverse relationship between the cluster size and the cluster connectivity, and the nonlinear changes in cluster size with dose thresholds. In addition, the clusters were insensitive to the shape of OARs. The results demonstrated that the cluster size could serve as an insightful index of normal tissue damage. The clinical outcome of the same dose-volume might be potentially different. Copyright © 2018 Elsevier B.V. All rights reserved.
Model-based clustering for RNA-seq data.

PubMed

Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

2014-01-15

RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Distributed Similarity based Clustering and Compressed Forwarding for wireless sensor networks.

PubMed

Arunraja, Muruganantham; Malathi, Veluchamy; Sakthivel, Erulappan

2015-11-01

Wireless sensor networks are engaged in various data gathering applications. The major bottleneck in wireless data gathering systems is the finite energy of sensor nodes. By conserving the on board energy, the life span of wireless sensor network can be well extended. Data communication being the dominant energy consuming activity of wireless sensor network, data reduction can serve better in conserving the nodal energy. Spatial and temporal correlation among the sensor data is exploited to reduce the data communications. Data similar cluster formation is an effective way to exploit spatial correlation among the neighboring sensors. By sending only a subset of data and estimate the rest using this subset is the contemporary way of exploiting temporal correlation. In Distributed Similarity based Clustering and Compressed Forwarding for wireless sensor networks, we construct data similar iso-clusters with minimal communication overhead. The intra-cluster communication is reduced using adaptive-normalized least mean squares based dual prediction framework. The cluster head reduces the inter-cluster data payload using a lossless compressive forwarding technique. The proposed work achieves significant data reduction in both the intra-cluster and the inter-cluster communications, with the optimal data accuracy of collected data. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
On the modification Highly Connected Subgraphs (HCS) algorithm in graph clustering for weighted graph

NASA Astrophysics Data System (ADS)

Albirri, E. R.; Sugeng, K. A.; Aldila, D.

2018-04-01

Nowadays, in the modern world, since technology and human civilization start to progress, all city in the world is almost connected. The various places in this world are easier to visit. It is an impact of transportation technology and highway construction. The cities which have been connected can be represented by graph. Graph clustering is one of ways which is used to answer some problems represented by graph. There are some methods in graph clustering to solve the problem spesifically. One of them is Highly Connected Subgraphs (HCS) method. HCS is used to identify cluster based on the graph connectivity k for graph G. The connectivity in graph G is denoted by k(G)> \\frac{n}{2} that n is the total of vertices in G, then it is called as HCS or the cluster. This research used literature review and completed with simulation of program in a software. We modified HCS algorithm by using weighted graph. The modification is located in the Process Phase. Process Phase is used to cut the connected graph G into two subgraphs H and \\bar{H}. We also made a program by using software Octave-401. Then we applied the data of Flight Routes Mapping of One of Airlines in Indonesia to our program.
2D evaluation of spectral LIBS data derived from heterogeneous materials using cluster algorithm

NASA Astrophysics Data System (ADS)

Gottlieb, C.; Millar, S.; Grothe, S.; Wilsch, G.

2017-08-01

Laser-induced Breakdown Spectroscopy (LIBS) is capable of providing spatially resolved element maps in regard to the chemical composition of the sample. The evaluation of heterogeneous materials is often a challenging task, especially in the case of phase boundaries. In order to determine information about a certain phase of a material, the need for a method that offers an objective evaluation is necessary. This paper will introduce a cluster algorithm in the case of heterogeneous building materials (concrete) to separate the spectral information of non-relevant aggregates and cement matrix. In civil engineering, the information about the quantitative ingress of harmful species like Cl-, Na+ and SO42- is of great interest in the evaluation of the remaining lifetime of structures (Millar et al., 2015; Wilsch et al., 2005). These species trigger different damage processes such as the alkali-silica reaction (ASR) or the chloride-induced corrosion of the reinforcement. Therefore, a discrimination between the different phases, mainly cement matrix and aggregates, is highly important (Weritz et al., 2006). For the 2D evaluation, the expectation-maximization-algorithm (EM algorithm; Ester and Sander, 2000) has been tested for the application presented in this work. The method has been introduced and different figures of merit have been presented according to recommendations given in Haddad et al. (2014). Advantages of this method will be highlighted. After phase separation, non-relevant information can be excluded and only the wanted phase displayed. Using a set of samples with known and unknown composition, the EM-clustering method has been validated regarding to Gustavo González and Ángeles Herrador (2007).
Truncation of the Binary Distribution Function in Globular Cluster Formation

NASA Astrophysics Data System (ADS)

Vesperini, E.; Chernoff, David F.

1996-02-01

We investigate a population of primordial binaries during the initial stage of evolution of a star cluster. For our calculations we assume that equal-mass stars form rapidly in a tidally truncated gas cloud, that ˜10% of the stars are in binaries, and that the resulting star cluster undergoes an epoch of violent relaxation. We study the collisional interaction of the binaries and single stars, in particular, the ionization of the binaries and the energy exchange between binaries and single stars. We find that for large N systems (N > 1000), even the most violent beginning leaves the binary distribution function largely intact. Hence, the binding energy originally tied up in the cloud's protostellar pairs is preserved during the relaxation process, and the binaries are available to interact at later times within the virialized cluster.
Estimating the ratios of the stationary distribution values for Markov chains modeling evolutionary algorithms.

PubMed

Mitavskiy, Boris; Cannings, Chris

2009-01-01

The evolutionary algorithm stochastic process is well-known to be Markovian. These have been under investigation in much of the theoretical evolutionary computing research. When the mutation rate is positive, the Markov chain modeling of an evolutionary algorithm is irreducible and, therefore, has a unique stationary distribution. Rather little is known about the stationary distribution. In fact, the only quantitative facts established so far tell us that the stationary distributions of Markov chains modeling evolutionary algorithms concentrate on uniform populations (i.e., those populations consisting of a repeated copy of the same individual). At the same time, knowing the stationary distribution may provide some information about the expected time it takes for the algorithm to reach a certain solution, assessment of the biases due to recombination and selection, and is of importance in population genetics to assess what is called a "genetic load" (see the introduction for more details). In the recent joint works of the first author, some bounds have been established on the rates at which the stationary distribution concentrates on the uniform populations. The primary tool used in these papers is the "quotient construction" method. It turns out that the quotient construction method can be exploited to derive much more informative bounds on ratios of the stationary distribution values of various subsets of the state space. In fact, some of the bounds obtained in the current work are expressed in terms of the parameters involved in all the three main stages of an evolutionary algorithm: namely, selection, recombination, and mutation.
The distribution of mass for spiral galaxies in clusters and in the field

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forbes, D.A.; Whitmore, B.C.

1989-04-01

A comparison is made between the mass distributions of spiral galaxies in clusters and in the field using Burstein's mass-type methodology. Both the H-alpha emission-line rotation curves and more extended H I rotation curves are used. The fitting technique for determining mass types used by Burstein and coworkers has been replaced by an objective chi-sq method. Mass types are shown to be a function of both the Hubble type and luminosity, contrary to earlier results. The present data show a difference in the distribution of mass types for spiral galaxies in the field and in clusters, in the sense thatmore » mass type I galaxies, where the inner and outer velocity gradients are similar, are generally found in the field rather than in clusters. This can be understood in terms of the results of Whitmore, Forbes, and Rubin (1988), who find that the rotation curves of galaxies in the central region of clusters are generally failing, while the outer galaxies in a cluster and field galaxies tend to have flat or rising rotation curves. 15 refs.« less
Research on distributed heterogeneous data PCA algorithm based on cloud platform

NASA Astrophysics Data System (ADS)

Zhang, Jin; Huang, Gang

2018-05-01

Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
Improved mine blast algorithm for optimal cost design of water distribution systems

NASA Astrophysics Data System (ADS)

Sadollah, Ali; Guen Yoo, Do; Kim, Joong Hoon

2015-12-01

The design of water distribution systems is a large class of combinatorial, nonlinear optimization problems with complex constraints such as conservation of mass and energy equations. Since feasible solutions are often extremely complex, traditional optimization techniques are insufficient. Recently, metaheuristic algorithms have been applied to this class of problems because they are highly efficient. In this article, a recently developed optimizer called the mine blast algorithm (MBA) is considered. The MBA is improved and coupled with the hydraulic simulator EPANET to find the optimal cost design for water distribution systems. The performance of the improved mine blast algorithm (IMBA) is demonstrated using the well-known Hanoi, New York tunnels and Balerma benchmark networks. Optimization results obtained using IMBA are compared to those using MBA and other optimizers in terms of their minimum construction costs and convergence rates. For the complex Balerma network, IMBA offers the cheapest network design compared to other optimization algorithms.

A distributed geo-routing algorithm for wireless sensor networks.

PubMed

Joshi, Gyanendra Prasad; Kim, Sung Won

2009-01-01

Geographic wireless sensor networks use position information for greedy routing. Greedy routing works well in dense networks, whereas in sparse networks it may fail and require a recovery algorithm. Recovery algorithms help the packet to get out of the communication void. However, these algorithms are generally costly for resource constrained position-based wireless sensor networks (WSNs). In this paper, we propose a void avoidance algorithm (VAA), a novel idea based on upgrading virtual distance. VAA allows wireless sensor nodes to remove all stuck nodes by transforming the routing graph and forwarding packets using only greedy routing. In VAA, the stuck node upgrades distance unless it finds a next hop node that is closer to the destination than it is. VAA guarantees packet delivery if there is a topologically valid path. Further, it is completely distributed, immediately responds to node failure or topology changes and does not require planarization of the network. NS-2 is used to evaluate the performance and correctness of VAA and we compare its performance to other protocols. Simulations show our proposed algorithm consumes less energy, has an efficient path and substantially less control overheads.
Dimension-Factorized Range Migration Algorithm for Regularly Distributed Array Imaging

PubMed Central

Guo, Qijia; Wang, Jie; Chang, Tianying

2017-01-01

The two-dimensional planar MIMO array is a popular approach for millimeter wave imaging applications. As a promising practical alternative, sparse MIMO arrays have been devised to reduce the number of antenna elements and transmitting/receiving channels with predictable and acceptable loss in image quality. In this paper, a high precision three-dimensional imaging algorithm is proposed for MIMO arrays of the regularly distributed type, especially the sparse varieties. Termed the Dimension-Factorized Range Migration Algorithm, the new imaging approach factorizes the conventional MIMO Range Migration Algorithm into multiple operations across the sparse dimensions. The thinner the sparse dimensions of the array, the more efficient the new algorithm will be. Advantages of the proposed approach are demonstrated by comparison with the conventional MIMO Range Migration Algorithm and its non-uniform fast Fourier transform based variant in terms of all the important characteristics of the approaches, especially the anti-noise capability. The computation cost is analyzed as well to evaluate the efficiency quantitatively. PMID:29113083
Supply chain management using fp-growth algorithm for medicine distribution

NASA Astrophysics Data System (ADS)

Wahana, A.; Maylawati, D. S.; Irfan, M.; Effendy, H.

2018-03-01

Distribution of drugs evenly in accordance with the needs of Public Health Center (Puskesmas) become one of the responsibilities by the Health Office in Indonesia. This study aims to provide recommendations for distribution of drugs from the Department of Health according to the needs of each Puskesmas. Because often the distribution of drugs is not in accordance with the needs of medicines stock of each Puskesmas. This causes the possibility of drug stock void in Puskesmas in need, while there is excess stock of drugs in Puskesmas that do not need. Supply Chain Management (SCM) is applied as a controlling drug stock at Puskesmas in order to avoid drug vacuum or excess drug that eventually unused. In addition, the Frequent Pattern Growth (FP-Growth) algorithm that generates frequent item sets is used to provide drug distribution recommendations by looking at the highest frequency of drug occurrences and frequent drug frequencies. Based on testing black box system conducted in Health Office Purwakarta regency of Indonesia which oversees 20 Puskesmas with 100 data of drug distribution transactions, it can be concluded that system functionality is running well and SCM successfully implemented to arrange distribution process of medicine well. Furthermore FP-Growth algorithm was able to provide recommendations for distribution of drugs with a high success rate. This is evidenced by the test results with various combinations of input parameters, FP-Growth is able to produce the right frequent item sets.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

NASA Astrophysics Data System (ADS)

Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

2016-12-01

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Implementation of Automatic Clustering Algorithm and Fuzzy Time Series in Motorcycle Sales Forecasting

NASA Astrophysics Data System (ADS)

Rasim; Junaeti, E.; Wirantika, R.

2018-01-01

Accurate forecasting for the sale of a product depends on the forecasting method used. The purpose of this research is to build motorcycle sales forecasting application using Fuzzy Time Series method combined with interval determination using automatic clustering algorithm. Forecasting is done using the sales data of motorcycle sales in the last ten years. Then the error rate of forecasting is measured using Means Percentage Error (MPE) and Means Absolute Percentage Error (MAPE). The results of forecasting in the one-year period obtained in this study are included in good accuracy.
Algorithm to determine the percolation largest component in interconnected networks.

PubMed

Schneider, Christian M; Araújo, Nuno A M; Herrmann, Hans J

2013-04-01

Interconnected networks have been shown to be much more vulnerable to random and targeted failures than isolated ones, raising several interesting questions regarding the identification and mitigation of their risk. The paradigm to address these questions is the percolation model, where the resilience of the system is quantified by the dependence of the size of the largest cluster on the number of failures. Numerically, the major challenge is the identification of this cluster and the calculation of its size. Here, we propose an efficient algorithm to tackle this problem. We show that the algorithm scales as O(NlogN), where N is the number of nodes in the network, a significant improvement compared to O(N(2)) for a greedy algorithm, which permits studying much larger networks. Our new strategy can be applied to any network topology and distribution of interdependencies, as well as any sequence of failures.
UV TO FAR-IR CATALOG OF A GALAXY SAMPLE IN NEARBY CLUSTERS: SPECTRAL ENERGY DISTRIBUTIONS AND ENVIRONMENTAL TRENDS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hernandez-Fernandez, Jonathan D.; Iglesias-Paramo, J.; Vilchez, J. M., E-mail: jonatan@iaa.es

2012-03-01

In this paper, we present a sample of cluster galaxies devoted to study the environmental influence on the star formation activity. This sample of galaxies inhabits in clusters showing a rich variety in their characteristics and have been observed by the SDSS-DR6 down to M{sub B} {approx} -18, and by the Galaxy Evolution Explorer AIS throughout sky regions corresponding to several megaparsecs. We assign the broadband and emission-line fluxes from ultraviolet to far-infrared to each galaxy performing an accurate spectral energy distribution for spectral fitting analysis. The clusters follow the general X-ray luminosity versus velocity dispersion trend of L{sub X}more » {proportional_to} {sigma}{sup 4.4}{sub c}. The analysis of the distributions of galaxy density counting up to the 5th nearest neighbor {Sigma}{sub 5} shows: (1) the virial regions and the cluster outskirts share a common range in the high density part of the distribution. This can be attributed to the presence of massive galaxy structures in the surroundings of virial regions. (2) The virial regions of massive clusters ({sigma}{sub c} > 550 km s{sup -1}) present a {Sigma}{sub 5} distribution statistically distinguishable ({approx}96%) from the corresponding distribution of low-mass clusters ({sigma}{sub c} < 550 km s{sup -1}). Both massive and low-mass clusters follow a similar density-radius trend, but the low-mass clusters avoid the high density extreme. We illustrate, with ABELL 1185, the environmental trends of galaxy populations. Maps of sky projected galaxy density show how low-luminosity star-forming galaxies appear distributed along more spread structures than their giant counterparts, whereas low-luminosity passive galaxies avoid the low-density environment. Giant passive and star-forming galaxies share rather similar sky regions with passive galaxies exhibiting more concentrated distributions.« less
Canonical PSO Based K-Means Clustering Approach for Real Datasets.

PubMed

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
Exploring cluster Monte Carlo updates with Boltzmann machines

NASA Astrophysics Data System (ADS)

Wang, Lei

2017-11-01

Boltzmann machines are physics informed generative models with broad applications in machine learning. They model the probability distribution of an input data set with latent variables and generate new samples accordingly. Applying the Boltzmann machines back to physics, they are ideal recommender systems to accelerate the Monte Carlo simulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that the generative sampling of the Boltzmann machines can even give different cluster Monte Carlo algorithms. The latent representation of the Boltzmann machines can be designed to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four-spin plaquette interactions. In the future, automatic searches in the algorithm space parametrized by Boltzmann machines may discover more innovative Monte Carlo updates.
Semi-supervised clustering methods.

PubMed

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
Optimal colour quality of LED clusters based on memory colours.

PubMed

Smet, Kevin; Ryckaert, Wouter R; Pointer, Michael R; Deconinck, Geert; Hanselaer, Peter

2011-03-28

The spectral power distributions of tri- and tetrachromatic clusters of Light-Emitting-Diodes, composed of simulated and commercially available LEDs, were optimized with a genetic algorithm to maximize the luminous efficacy of radiation and the colour quality as assessed by the memory colour quality metric developed by the authors. The trade-off of the colour quality as assessed by the memory colour metric and the luminous efficacy of radiation was investigated by calculating the Pareto optimal front using the NSGA-II genetic algorithm. Optimal peak wavelengths and spectral widths of the LEDs were derived, and over half of them were found to be close to Thornton's prime colours. The Pareto optimal fronts of real LED clusters were always found to be smaller than those of the simulated clusters. The effect of binning on designing a real LED cluster was investigated and was found to be quite large. Finally, a real LED cluster of commercially available AlGaInP, InGaN and phosphor white LEDs was optimized to obtain a higher score on memory colour quality scale than its corresponding CIE reference illuminant.
Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

NASA Astrophysics Data System (ADS)

Karaca, Yeliz; Cattani, Carlo

Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.
Quantum annealing for combinatorial clustering

NASA Astrophysics Data System (ADS)

Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph

2018-02-01

Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
The rates and time-delay distribution of multiply imaged supernovae behind lensing clusters

NASA Astrophysics Data System (ADS)

Li, Xue; Hjorth, Jens; Richard, Johan

2012-11-01

Time delays of gravitationally lensed sources can be used to constrain the mass model of a deflector and determine cosmological parameters. We here present an analysis of the time-delay distribution of multiply imaged sources behind 17 strong lensing galaxy clusters with well-calibrated mass models. We find that for time delays less than 1000 days, at z = 3.0, their logarithmic probability distribution functions are well represented by P(log Δt) = 5.3 × 10-4Δttilde beta/M2502tilde beta, with tilde beta = 0.77, where M250 is the projected cluster mass inside 250 kpc (in 1014M⊙), and tilde beta is the power-law slope of the distribution. The resultant probability distribution function enables us to estimate the time-delay distribution in a lensing cluster of known mass. For a cluster with M250 = 2 × 1014M⊙, the fraction of time delays less than 1000 days is approximately 3%. Taking Abell 1689 as an example, its dark halo and brightest galaxies, with central velocity dispersions σ>=500kms-1, mainly produce large time delays, while galaxy-scale mass clumps are responsible for generating smaller time delays. We estimate the probability of observing multiple images of a supernova in the known images of Abell 1689. A two-component model of estimating the supernova rate is applied in this work. For a magnitude threshold of mAB = 26.5, the yearly rate of Type Ia (core-collapse) supernovae with time delays less than 1000 days is 0.004±0.002 (0.029±0.001). If the magnitude threshold is lowered to mAB ~ 27.0, the rate of core-collapse supernovae suitable for time delay observation is 0.044±0.015 per year.
Grain Cluster Microstructure and Grain Boundary Character Distribution in Alloy 690

NASA Astrophysics Data System (ADS)

Xia, Shuang; Zhou, Bangxin; Chen, Wenjue

2009-12-01

The effects of thermal-mechanical processing (TMP) on microstructure evolution during recrystallization and grain boundary character distribution (GBCD) in aged Alloy 690 were investigated by the electron backscatter diffraction (EBSD) technique and optical microscopy. The original grain boundaries of the deformed microstructure did not play an important role in the manipulation of the proportion of the Σ3 n ( n = 1, 2, 3…) type boundaries. Instead, the grain cluster formed by multiple twinning starting from a single nucleus during recrystallization was the key microstructural feature affecting the GBCD. All of the grains in this kind of cluster had Σ3 n mutual misorientations regardless of whether they were adjacent. A large grain cluster containing 91 grains was found in the sample after a small-strain (5 pct) and a high-temperature (1100 °C) recrystallization anneal, and twin relationships up to the ninth generation (Σ39) were found in this cluster. The ratio of cluster size over grain size (including all types of boundaries as defining individual grains) dictated the proportion of Σ3 n boundaries.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Enhancing PC Cluster-Based Parallel Branch-and-Bound Algorithms for the Graph Coloring Problem

NASA Astrophysics Data System (ADS)

Taoka, Satoshi; Takafuji, Daisuke; Watanabe, Toshimasa

A branch-and-bound algorithm (BB for short) is the most general technique to deal with various combinatorial optimization problems. Even if it is used, computation time is likely to increase exponentially. So we consider its parallelization to reduce it. It has been reported that the computation time of a parallel BB heavily depends upon node-variable selection strategies. And, in case of a parallel BB, it is also necessary to prevent increase in communication time. So, it is important to pay attention to how many and what kind of nodes are to be transferred (called sending-node selection strategy). In this paper, for the graph coloring problem, we propose some sending-node selection strategies for a parallel BB algorithm by adopting MPI for parallelization and experimentally evaluate how these strategies affect computation time of a parallel BB on a PC cluster network.
Multi-agent coordination algorithms for control of distributed energy resources in smart grids

NASA Astrophysics Data System (ADS)

Cortes, Andres

Sustainable energy is a top-priority for researchers these days, since electricity and transportation are pillars of modern society. Integration of clean energy technologies such as wind, solar, and plug-in electric vehicles (PEVs), is a major engineering challenge in operation and management of power systems. This is due to the uncertain nature of renewable energy technologies and the large amount of extra load that PEVs would add to the power grid. Given the networked structure of a power system, multi-agent control and optimization strategies are natural approaches to address the various problems of interest for the safe and reliable operation of the power grid. The distributed computation in multi-agent algorithms addresses three problems at the same time: i) it allows for the handling of problems with millions of variables that a single processor cannot compute, ii) it allows certain independence and privacy to electricity customers by not requiring any usage information, and iii) it is robust to localized failures in the communication network, being able to solve problems by simply neglecting the failing section of the system. We propose various algorithms to coordinate storage, generation, and demand resources in a power grid using multi-agent computation and decentralized decision making. First, we introduce a hierarchical vehicle-one-grid (V1G) algorithm for coordination of PEVs under usage constraints, where energy only flows from the grid in to the batteries of PEVs. We then present a hierarchical vehicle-to-grid (V2G) algorithm for PEV coordination that takes into consideration line capacity constraints in the distribution grid, and where energy flows both ways, from the grid in to the batteries, and from the batteries to the grid. Next, we develop a greedy-like hierarchical algorithm for management of demand response events with on/off loads. Finally, we introduce distributed algorithms for the optimal control of distributed energy resources, i
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds

NASA Astrophysics Data System (ADS)

Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni

2012-09-01

Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
The distribution of dark matter in the A2256 cluster

NASA Technical Reports Server (NTRS)

Henry, J. Patrick; Briel, Ulrich G.; Nulsen, Paul E. J.

1993-01-01

Using spatially resolved X-ray spectroscopy, it was determined that the X-ray emitting gas in the rich cluster A2256 is nearly isothermal to a radius of at least 0.76/h Mpc, or about three core radii. These data can be used to measure the distribution of the dark matter in the cluster. It was found that the total mass interior to 0.76/h Mpc and 1.5/h Mpc is (0.5 +/- 0.1 and 1.0 +/- 0.5) x 10(exp 15)/h of the solar mass respectively where the errors encompass the full range allowed by all models used. Thus, the mass appropriate to the region where spectral information was obtained is well determined, but the uncertainties become large upon extrapolating beyond that region. It is shown that the galaxy orbits are midly anisotropic which may cause the beta discrepancy in this cluster.

Inference from clustering with application to gene-expression microarrays.

PubMed

Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

2002-01-01

There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A
A topology visualization early warning distribution algorithm for large-scale network security incidents.

PubMed

He, Hui; Fan, Guotao; Ye, Jianwei; Zhang, Weizhe

2013-01-01

It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system's emergency response capabilities, alleviate the cyber attacks' damage, and strengthen the system's counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system's plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks' topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.
Research reactor loading pattern optimization using estimation of distribution algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, S.; Ziver, K.; AMCG Group, RM Consultants, Abingdon

2006-07-01

A new evolutionary search based approach for solving the nuclear reactor loading pattern optimization problems is presented based on the Estimation of Distribution Algorithms. The optimization technique developed is then applied to the maximization of the effective multiplication factor (K{sub eff}) of the Imperial College CONSORT research reactor (the last remaining civilian research reactor in the United Kingdom). A new elitism-guided searching strategy has been developed and applied to improve the local convergence together with some problem-dependent information based on the 'stand-alone K{sub eff} with fuel coupling calculations. A comparison study between the EDAs and a Genetic Algorithm with Heuristicmore » Tie Breaking Crossover operator has shown that the new algorithm is efficient and robust. (authors)« less
Prospects for Determining the Mass Distributions of Galaxy Clusters on Large Scales Using Weak Gravitational Lensing

NASA Astrophysics Data System (ADS)

Fong, M.; Bowyer, R.; Whitehead, A.; Lee, B.; King, L.; Applegate, D.; McCarthy, I.

2018-05-01

For more than two decades, the Navarro, Frenk, and White (NFW) model has stood the test of time; it has been used to describe the distribution of mass in galaxy clusters out to their outskirts. Stacked weak lensing measurements of clusters are now revealing the distribution of mass out to and beyond their virial radii, where the NFW model is no longer applicable. In this study we assess how well the parameterised Diemer & Kravstov (DK) density profile describes the characteristic mass distribution of galaxy clusters extracted from cosmological simulations. This is determined from stacked synthetic lensing measurements of the 50 most massive clusters extracted from the Cosmo-OWLS simulations, using the Dark Matter Only run and also the run that most closely matches observations. The characteristics of the data reflect the Weighing the Giants survey and data from the future Large Synoptic Survey Telescope (LSST). In comparison with the NFW model, the DK model favored by the stacked data, in particular for the future LSST data, where the number density of background galaxies is higher. The DK profile depends on the accretion history of clusters which is specified in the current study. Eventually however subsamples of galaxy clusters with qualities indicative of disparate accretion histories could be studied.
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.

PubMed

Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai

2016-03-01

Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Electron impact ionization of size selected hydrogen clusters (H2)N: ion fragment and neutral size distributions.

PubMed

Kornilov, Oleg; Toennies, J Peter

2008-05-21

Clusters consisting of normal H2 molecules, produced in a free jet expansion, are size selected by diffraction from a transmission nanograting prior to electron impact ionization. For each neutral cluster (H2)(N) (N=2-40), the relative intensities of the ion fragments Hn+ are measured with a mass spectrometer. H3+ is found to be the most abundant fragment up to N=17. With a further increase in N, the abundances of H3+, H5+, H7+, and H9+ first increase and, after passing through a maximum, approach each other. At N=40, they are about the same and more than a factor of 2 and 3 larger than for H11+ and H13+, respectively. For a given neutral cluster size, the intensities of the ion fragments follow a Poisson distribution. The fragmentation probabilities are used to determine the neutral cluster size distribution produced in the expansion at a source temperature of 30.1 K and a source pressure of 1.50 bar. The distribution shows no clear evidence of a magic number N=13 as predicted by theory and found in experiments with pure para-H2 clusters. The ion fragment distributions are also used to extract information on the internal energy distribution of the H3+ ions produced in the reaction H2+ + H2-->H3+ +H, which is initiated upon ionization of the cluster. The internal energy is assumed to be rapidly equilibrated and to determine the number of molecules subsequently evaporated. The internal energy distribution found in this way is in good agreement with data obtained in an earlier independent merged beam scattering experiment.
Canonical PSO Based K-Means Clustering Approach for Real Datasets

PubMed Central

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
Studying the varied shapes of gold clusters by an elegant optimization algorithm that hybridizes the density functional tight-binding theory and the density functional theory

NASA Astrophysics Data System (ADS)

Yen, Tsung-Wen; Lim, Thong-Leng; Yoon, Tiem-Leong; Lai, S. K.

2017-11-01

We combined a new parametrized density functional tight-binding (DFTB) theory (Fihey et al. 2015) with an unbiased modified basin hopping (MBH) optimization algorithm (Yen and Lai 2015) and applied it to calculate the lowest energy structures of Au clusters. From the calculated topologies and their conformational changes, we find that this DFTB/MBH method is a necessary procedure for a systematic study of the structural development of Au clusters but is somewhat insufficient for a quantitative study. As a result, we propose an extended hybridized algorithm. This improved algorithm proceeds in two steps. In the first step, the DFTB theory is employed to calculate the total energy of the cluster and this step (through running DFTB/MBH optimization for given Monte-Carlo steps) is meant to efficiently bring the Au cluster near to the region of the lowest energy minimum since the cluster as a whole has explicitly considered the interactions of valence electrons with ions, albeit semi-quantitatively. Then, in the second succeeding step, the energy-minimum searching process will continue with a skilledly replacement of the energy function calculated by the DFTB theory in the first step by one calculated in the full density functional theory (DFT). In these subsequent calculations, we couple the DFT energy also with the MBH strategy and proceed with the DFT/MBH optimization until the lowest energy value is found. We checked that this extended hybridized algorithm successfully predicts the twisted pyramidal structure for the Au40 cluster and correctly confirms also the linear shape of C8 which our previous DFTB/MBH method failed to do so. Perhaps more remarkable is the topological growth of Aun: it changes from a planar (n =3-11) → an oblate-like cage (n =12-15) → a hollow-shape cage (n =16-18) and finally a pyramidal-like cage (n =19, 20). These varied forms of the cluster's shapes are consistent with those reported in the literature.
Cluster-cluster correlations and constraints on the correlation hierarchy

NASA Technical Reports Server (NTRS)

Hamilton, A. J. S.; Gott, J. R., III

1988-01-01

The hypothesis that galaxies cluster around clusters at least as strongly as they cluster around galaxies imposes constraints on the hierarchy of correlation amplitudes in hierachical clustering models. The distributions which saturate these constraints are the Rayleigh-Levy random walk fractals proposed by Mandelbrot; for these fractal distributions cluster-cluster correlations are all identically equal to galaxy-galaxy correlations. If correlation amplitudes exceed the constraints, as is observed, then cluster-cluster correlations must exceed galaxy-galaxy correlations, as is observed.
Estimating carbon cluster binding energies from measured Cn distributions, n <= 10

NASA Astrophysics Data System (ADS)

Pargellis, A. N.

1990-08-01

Experimental data are presented for the cluster distribution of sputtered negative carbon clusters, C-n, with n≤10. Additionally, clusters have been observed with masses indicating they are CsC-2n, with n≤4. The C-n data are compared with the data obtained by other groups, for neutral and charged clusters, using a variety of sources such as evaporation, sputtering, and laser ablation. The data are used to estimate the cluster binding energies En, using the universal relation, En=(n-1)ΔHn+RTe [ln(Jn/J1)+0.5 ln(n)-α-(ΔSn-ΔS1)/R], derived from basic kinetic and thermodynamic relations. The estimated values agree astonishingly well with values from the literature, varying from published values by at most a few percent. In this equation, Jn is the observed current of n-atom clusters, ΔHn is the heat of vaporization, ΔH1=7.41 eV, and Te ≊0.25 eV (2900 K) is the effective source temperature. The relative change in cluster entropy during sublimation from the solid to vapor phase is approximated to first order by the relation (ΔSn-ΔS1)/R =3.1+0.9(n-2), and is fit to published data for n between 2 and 5 and temperatures between 2000 and 4000 K. The parameter α is empirical, obtained by fitting the data to known binding energies for Cn≤5 clusters. For evaporation sources, α must be zero, but α˜7 when sputtering with Cs+ ions, indicating the sputtered clusters appear to be in thermodynamic equilibrium, but not the atoms. Several possible mechanisms for the formation of clusters during sputtering are examined. One plausible mechanism is that atoms diffuse on the graphite surface to form clusters which are then desorbed by energetic, recoil atoms created in subsequent sputtering events.
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor
Damage diagnosis algorithm using a sequential change point detection method with an unknown distribution for damage

NASA Astrophysics Data System (ADS)

Noh, Hae Young; Rajagopal, Ram; Kiremidjian, Anne S.

2012-04-01

This paper introduces a damage diagnosis algorithm for civil structures that uses a sequential change point detection method for the cases where the post-damage feature distribution is unknown a priori. This algorithm extracts features from structural vibration data using time-series analysis and then declares damage using the change point detection method. The change point detection method asymptotically minimizes detection delay for a given false alarm rate. The conventional method uses the known pre- and post-damage feature distributions to perform a sequential hypothesis test. In practice, however, the post-damage distribution is unlikely to be known a priori. Therefore, our algorithm estimates and updates this distribution as data are collected using the maximum likelihood and the Bayesian methods. We also applied an approximate method to reduce the computation load and memory requirement associated with the estimation. The algorithm is validated using multiple sets of simulated data and a set of experimental data collected from a four-story steel special moment-resisting frame. Our algorithm was able to estimate the post-damage distribution consistently and resulted in detection delays only a few seconds longer than the delays from the conventional method that assumes we know the post-damage feature distribution. We confirmed that the Bayesian method is particularly efficient in declaring damage with minimal memory requirement, but the maximum likelihood method provides an insightful heuristic approach.
An energy-efficient and secure hybrid algorithm for wireless sensor networks using a mobile data collector

NASA Astrophysics Data System (ADS)

Dayananda, Karanam Ravichandran; Straub, Jeremy

2017-05-01

This paper proposes a new hybrid algorithm for security, which incorporates both distributed and hierarchal approaches. It uses a mobile data collector (MDC) to collect information in order to save energy of sensor nodes in a wireless sensor network (WSN) as, in most networks, these sensor nodes have limited energy. Wireless sensor networks are prone to security problems because, among other things, it is possible to use a rogue sensor node to eavesdrop on or alter the information being transmitted. To prevent this, this paper introduces a security algorithm for MDC-based WSNs. A key use of this algorithm is to protect the confidentiality of the information sent by the sensor nodes. The sensor nodes are deployed in a random fashion and form group structures called clusters. Each cluster has a cluster head. The cluster head collects data from the other nodes using the time-division multiple access protocol. The sensor nodes send their data to the cluster head for transmission to the base station node for further processing. The MDC acts as an intermediate node between the cluster head and base station. The MDC, using its dynamic acyclic graph path, collects the data from the cluster head and sends it to base station. This approach is useful for applications including warfighting, intelligent building and medicine. To assess the proposed system, the paper presents a comparison of its performance with other approaches and algorithms that can be used for similar purposes.
A Scheduling Algorithm for the Distributed Student Registration System in Transaction-Intensive Environment

ERIC Educational Resources Information Center

Li, Wenhao

2011-01-01

Distributed workflow technology has been widely used in modern education and e-business systems. Distributed web applications have shown cross-domain and cooperative characteristics to meet the need of current distributed workflow applications. In this paper, the author proposes a dynamic and adaptive scheduling algorithm PCSA (Pre-Calculated…
Convalescing Cluster Configuration Using a Superlative Framework

PubMed Central

Sabitha, R.; Karthik, S.

2015-01-01

Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
Mass distribution in galaxy clusters: the role of Active Galactic Nuclei feedback

NASA Astrophysics Data System (ADS)

Teyssier, Romain; Moore, Ben; Martizzi, Davide; Dubois, Yohan; Mayer, Lucio

2011-06-01

We use 1-kpc resolution cosmological Adaptive Mesh Refinement (AMR) simulations of a Virgo-like galaxy cluster to investigate the effect of feedback from supermassive black holes on the mass distribution of dark matter, gas and stars. We compared three different models: (i) a standard galaxy formation model featuring gas cooling, star formation and supernovae feedback, (ii) a 'quenching' model for which star formation is artificially suppressed in massive haloes and finally (iii) the recently proposed active galactic nucleus (AGN) feedback model of Booth and Schaye. Without AGN feedback (even in the quenching case), our simulated cluster suffers from a strong overcooling problem, with a stellar mass fraction significantly above observed values in M87. The baryon distribution is highly concentrated, resulting in a strong adiabatic contraction (AC) of dark matter. With AGN feedback, on the contrary, the stellar mass in the brightest cluster galaxy (BCG) lies below observational estimates and the overcooling problem disappears. The stellar mass of the BCG is seen to increase with increasing mass resolution, suggesting that our stellar masses converge to the correct value from below. The gas and total mass distributions are in better agreement with observations. We also find a slight deficit (˜10 per cent) of baryons at the virial radius, due to the combined effect of AGN-driven convective motions in the inner parts and shock waves in the outer regions, pushing gas to Mpc scales and beyond. This baryon deficit results in a slight adiabatic expansion of the dark matter distribution that can be explained quantitatively by AC theory.
[A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method].

PubMed

Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

2011-04-01

An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.
Fuzzy Subspace Clustering

NASA Astrophysics Data System (ADS)

Borgelt, Christian

In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).
Logistics Distribution Center Location Evaluation Based on Genetic Algorithm and Fuzzy Neural Network

NASA Astrophysics Data System (ADS)

Shao, Yuxiang; Chen, Qing; Wei, Zhenhua

Logistics distribution center location evaluation is a dynamic, fuzzy, open and complicated nonlinear system, which makes it difficult to evaluate the distribution center location by the traditional analysis method. The paper proposes a distribution center location evaluation system which uses the fuzzy neural network combined with the genetic algorithm. In this model, the neural network is adopted to construct the fuzzy system. By using the genetic algorithm, the parameters of the neural network are optimized and trained so as to improve the fuzzy system’s abilities of self-study and self-adaptation. At last, the sampled data are trained and tested by Matlab software. The simulation results indicate that the proposed identification model has very small errors.
Analysis of an algorithm for distributed recognition and accountability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ko, C.; Frincke, D.A.; Goan, T. Jr.

1993-08-01

Computer and network systems are available to attacks. Abandoning the existing huge infrastructure of possibly-insecure computer and network systems is impossible, and replacing them by totally secure systems may not be feasible or cost effective. A common element in many attacks is that a single user will often attempt to intrude upon multiple resources throughout a network. Detecting the attack can become significantly easier by compiling and integrating evidence of such intrusion attempts across the network rather than attempting to assess the situation from the vantage point of only a single host. To solve this problem, we suggest an approachmore » for distributed recognition and accountability (DRA), which consists of algorithms which ``process,`` at a central location, distributed and asynchronous ``reports`` generated by computers (or a subset thereof) throughout the network. Our highest-priority objectives are to observe ways by which an individual moves around in a network of computers, including changing user names to possibly hide his/her true identity, and to associate all activities of multiple instance of the same individual to the same network-wide user. We present the DRA algorithm and a sketch of its proof under an initial set of simplifying albeit realistic assumptions. Later, we relax these assumptions to accommodate pragmatic aspects such as missing or delayed ``reports,`` clock slew, tampered ``reports,`` etc. We believe that such algorithms will have widespread applications in the future, particularly in intrusion-detection system.« less

SimBA: simulation algorithm to fit extant-population distributions.

PubMed

Parida, Laxmi; Haiminen, Niina

2015-03-14

Simulation of populations with specified characteristics such as allele frequencies, linkage disequilibrium etc., is an integral component of many studies, including in-silico breeding optimization. Since the accuracy and sensitivity of population simulation is critical to the quality of the output of the applications that use them, accurate algorithms are required to provide a strong foundation to the methods in these studies. In this paper we present SimBA (Simulation using Best-fit Algorithm) a non-generative approach, based on a combination of stochastic techniques and discrete methods. We optimize a hill climbing algorithm and extend the framework to include multiple subpopulation structures. Additionally, we show that SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods. We show that SimBA outperforms the existing population simulation methods, both in terms of accuracy as well as time-efficiency. Not only does it construct populations that meet the input specifications more stringently than other published methods, SimBA is also easy to use. It does not require explicit parameter adaptations or calibrations. Also, it can work with input specified as distributions, without an exemplar matrix or population as required by some methods. SimBA is available at http://researcher.ibm.com/project/5669 .
Clustering Millions of Faces by Identity.

PubMed

Otto, Charles; Wang, Dayong; Jain, Anil K

2018-02-01

Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

NASA Astrophysics Data System (ADS)

Franke, R.

2016-11-01

In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

PubMed Central

Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

2015-01-01

This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
WordCluster: detecting clusters of DNA words and genomic elements

PubMed Central

2011-01-01

Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Distributed learning automata-based algorithm for community detection in complex networks

NASA Astrophysics Data System (ADS)

Khomami, Mohammad Mehdi Daliri; Rezvanian, Alireza; Meybodi, Mohammad Reza

2016-03-01

Community structure is an important and universal topological property of many complex networks such as social and information networks. The detection of communities of a network is a significant technique for understanding the structure and function of networks. In this paper, we propose an algorithm based on distributed learning automata for community detection (DLACD) in complex networks. In the proposed algorithm, each vertex of network is equipped with a learning automation. According to the cooperation among network of learning automata and updating action probabilities of each automaton, the algorithm interactively tries to identify high-density local communities. The performance of the proposed algorithm is investigated through a number of simulations on popular synthetic and real networks. Experimental results in comparison with popular community detection algorithms such as walk trap, Danon greedy optimization, Fuzzy community detection, Multi-resolution community detection and label propagation demonstrated the superiority of DLACD in terms of modularity, NMI, performance, min-max-cut and coverage.
Semi-supervised clustering methods

PubMed Central

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

PubMed

Saeed, Faisal; Salim, Naomie; Abdo, Ammar

2013-07-01

Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Algorithms to eliminate the influence of non-uniform intensity distributions on wavefront reconstruction by quadri-wave lateral shearing interferometers

NASA Astrophysics Data System (ADS)

Chen, Xiao-jun; Dong, Li-zhi; Wang, Shuai; Yang, Ping; Xu, Bing

2017-11-01

In quadri-wave lateral shearing interferometry (QWLSI), when the intensity distribution of the incident light wave is non-uniform, part of the information of the intensity distribution will couple with the wavefront derivatives to cause wavefront reconstruction errors. In this paper, we propose two algorithms to reduce the influence of a non-uniform intensity distribution on wavefront reconstruction. Our simulation results demonstrate that the reconstructed amplitude distribution (RAD) algorithm can effectively reduce the influence of the intensity distribution on the wavefront reconstruction and that the collected amplitude distribution (CAD) algorithm can almost eliminate it.
Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

DTIC Science & Technology

1982-02-26

UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an
Community Detection Algorithm Combining Stochastic Block Model and Attribute Data Clustering

NASA Astrophysics Data System (ADS)

Kataoka, Shun; Kobayashi, Takuto; Yasuda, Muneki; Tanaka, Kazuyuki

2016-11-01

We propose a new algorithm to detect the community structure in a network that utilizes both the network structure and vertex attribute data. Suppose we have the network structure together with the vertex attribute data, that is, the information assigned to each vertex associated with the community to which it belongs. The problem addressed this paper is the detection of the community structure from the information of both the network structure and the vertex attribute data. Our approach is based on the Bayesian approach that models the posterior probability distribution of the community labels. The detection of the community structure in our method is achieved by using belief propagation and an EM algorithm. We numerically verified the performance of our method using computer-generated networks and real-world networks.
Improving clustering with metabolic pathway data.

PubMed

Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

2014-04-10

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a
A distributed algorithm to maintain and repair the trail networks of arboreal ants.

PubMed

Chandrasekhar, Arjun; Gordon, Deborah M; Navlakha, Saket

2018-06-18

We study how the arboreal turtle ant (Cephalotes goniodontus) solves a fundamental computing problem: maintaining a trail network and finding alternative paths to route around broken links in the network. Turtle ants form a routing backbone of foraging trails linking several nests and temporary food sources. This species travels only in the trees, so their foraging trails are constrained to lie on a natural graph formed by overlapping branches and vines in the tangled canopy. Links between branches, however, can be ephemeral, easily destroyed by wind, rain, or animal movements. Here we report a biologically feasible distributed algorithm, parameterized using field data, that can plausibly describe how turtle ants maintain the routing backbone and find alternative paths to circumvent broken links in the backbone. We validate the ability of this probabilistic algorithm to circumvent simulated breaks in synthetic and real-world networks, and we derive an analytic explanation for why certain features are crucial to improve the algorithm's success. Our proposed algorithm uses fewer computational resources than common distributed graph search algorithms, and thus may be useful in other domains, such as for swarm computing or for coordinating molecular robots.
A genetic graph-based approach for partitional clustering.

PubMed

Menéndez, Héctor D; Barrero, David F; Camacho, David

2014-05-01

Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
Lagrangian analysis by clustering. An example in the Nordic Seas.

NASA Astrophysics Data System (ADS)

Koszalka, Inga; Lacasce, Joseph H.

2010-05-01

We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in uniform geographical bins, as is commonly done, we group a specified number of nearest-neighbor velocities. This is done via a clustering algorithm operating on the instantaneous positions of the drifters. Thus it is the data distribution itself which determines the positions of the averages and the areal extent of the clusters. A major advantage is that because the number of members is essentially the same for all clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter is an accurate representation of the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities (both of which are known from the stochastic model). We also compare the results to those obtained with fixed geographical bins. Clustering is more successful at capturing spatial variability of the mean flow and also improves convergence in the eddy diffusivity estimates. We discuss both the future prospects and shortcomings of the new method.
Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

NASA Astrophysics Data System (ADS)

Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

2017-07-01

Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
Orbit Clustering Based on Transfer Cost

NASA Technical Reports Server (NTRS)

Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

2013-01-01

We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
The applicability and effectiveness of cluster analysis

NASA Technical Reports Server (NTRS)

Ingram, D. S.; Actkinson, A. L.

1973-01-01

An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution

NASA Astrophysics Data System (ADS)

He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun

2016-05-01

Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.
Transport on percolation clusters with power-law distributed bond strengths.

PubMed

Alava, Mikko; Moukarzel, Cristian F

2003-05-01

The simplest transport problem, namely finding the maximum flow of current, or maxflow, is investigated on critical percolation clusters in two and three dimensions, using a combination of extremal statistics arguments and exact numerical computations, for power-law distributed bond strengths of the type P(sigma) approximately sigma(-alpha). Assuming that only cutting bonds determine the flow, the maxflow critical exponent v is found to be v(alpha)=(d-1)nu+1/(1-alpha). This prediction is confirmed with excellent accuracy using large-scale numerical simulation in two and three dimensions. However, in the region of anomalous bond capacity distributions (0< or =alpha< or =1) we demonstrate that, due to cluster-structure fluctuations, it is not the cutting bonds but the blobs that set the transport properties of the backbone. This "blob dominance" avoids a crossover to a regime where structural details, the distribution of the number of red or cutting bonds, would set the scaling. The restored scaling exponents, however, still follow the simplistic red bond estimate. This is argued to be due to the existence of a hierarchy of so-called minimum cut configurations, for which cutting bonds form the lowest level, and whose transport properties scale all in the same way. We point out the relevance of our findings to other scalar transport problems (i.e., conductivity).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.