Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.
Matsumoto, Shinya; Aisaki, Ken-ichi; Kanno, Jun
2005-01-01
The availability of whole-genome sequence data and high-throughput techniques such as DNA microarray enable researchers to monitor the alteration of gene expression by a certain organ or tissue in a comprehensive manner. The quantity of gene expression data can be greater than 30,000 genes per one measurement, making data clustering methods for analysis essential. Biologists usually design experimental protocols so that statistical significance can be evaluated; often, they conduct experiments in triplicate to generate a mean and standard deviation. Existing clustering methods usually use these mean or median values, rather than the original data, and take significance into account by omitting data showing large standard deviations, which eliminates potentially useful information. We propose a clustering method that uses each of the triplicate data sets as a probability distribution function instead of pooling data points into a median or mean. This method permits truly unsupervised clustering of the data from DNA microarrays. PMID:16901101
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks
NASA Astrophysics Data System (ADS)
Xiao, Xiaoli; Li, Yang
Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.
Cluster algorithms and computational complexity
NASA Astrophysics Data System (ADS)
Li, Xuenan
Cluster algorithms for the 2D Ising model with a staggered field have been studied and a new cluster algorithm for path sampling has been worked out. The complexity properties of Bak-Seppen model and the Growing network model have been studied by using the Computational Complexity Theory. The dynamic critical behavior of the two-replica cluster algorithm is studied. Several versions of the algorithm are applied to the two-dimensional, square lattice Ising model with a staggered field. The dynamic exponent for the full algorithm is found to be less than 0.5. It is found that odd translations of one replica with respect to the other together with global flips are essential for obtaining a small value of the dynamic exponent. The path sampling problem for the 1D Ising model is studied using both a local algorithm and a novel cluster algorithm. The local algorithm is extremely inefficient at low temperature, where the integrated autocorrelation time is found to be proportional to the fourth power of correlation length. The dynamic exponent of the cluster algorithm is found to be zero and therefore proved to be much more efficient than the local algorithm. The parallel computational complexity of the Bak-Sneppen evolution model is studied. It is shown that Bak-Sneppen histories can be generated by a massively parallel computer in a time that is polylog in the length of the history, which means that the logical depth of producing a Bak-Sneppen history is exponentially less than the length of the history. The parallel dynamics for generating Bak-Sneppen histories is contrasted to standard Bak-Sneppen dynamics. The parallel computational complexity of the Growing Network model is studied. The growth of the network with linear kernels is shown to be not complex and an algorithm with polylog parallel running time is found. The growth of the network with gamma ≥ 2 super-linear kernels can be realized by a randomized parallel algorithm with polylog expected running time.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Overlapping clusters for distributed computation.
Mirrokni, Vahab; Andersen, Reid; Gleich, David F.
2010-11-01
Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initial partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.
Sparse subspace clustering: algorithm, theory, and applications.
Elhamifar, Ehsan; Vidal, René
2013-11-01
Many real-world problems deal with collections of high-dimensional data, such as images, videos, text, and web documents, DNA microarray data, and more. Often, such high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories to which the data belong. In this paper, we propose and study an algorithm, called sparse subspace clustering, to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among the infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of the data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of the subspaces and the distribution of the data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm is efficient and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal directly with data nuisances, such as noise, sparse outlying entries, and missing entries, by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering. PMID:24051734
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Hierarchical link clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples.
Hierarchical link clustering algorithm in networks.
Bodlaj, Jernej; Batagelj, Vladimir
2015-06-01
Hierarchical network clustering is an approach to find tightly and internally connected clusters (groups or communities) of nodes in a network based on its structure. Instead of nodes, it is possible to cluster links of the network. The sets of nodes belonging to clusters of links can overlap. While overlapping clusters of nodes are not always expected, they are natural in many applications. Using appropriate dissimilarity measures, we can complement the clustering strategy to consider, for example, the semantic meaning of links or nodes based on their properties. We propose a new hierarchical link clustering algorithm which in comparison to existing algorithms considers node and/or link properties (descriptions, attributes) of the input network alongside its structure using monotonic dissimilarity measures. The algorithm determines communities that form connected subnetworks (relational constraint) containing locally similar nodes with respect to their description. It is only implicitly based on the corresponding line graph of the input network, thus reducing its space and time complexities. We investigate both complexities analytically and statistically. Using provided dissimilarity measures, our algorithm can, in addition to the general overlapping community structure of input networks, uncover also related subregions inside these communities in a form of hierarchy. We demonstrate this ability on real-world and artificial network examples. PMID:26172761
Color sorting algorithm based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Zhang, BaoFeng; Huang, Qian
2009-11-01
In the process of raisin production, there were a variety of color impurities, which needs be removed effectively. A new kind of efficient raisin color-sorting algorithm was presented here. First, the technology of image processing basing on the threshold was applied for the image pre-processing, and then the gray-scale distribution characteristic of the raisin image was found. In order to get the chromatic aberration image and reduce some disturbance, we made the flame image subtraction that the target image data minus the background image data. Second, Haar wavelet filter was used to get the smooth image of raisins. According to the different colors and mildew, spots and other external features, the calculation was made to identify the characteristics of their images, to enable them to fully reflect the quality differences between the raisins of different types. After the processing above, the image were analyzed by K-means clustering analysis method, which can achieve the adaptive extraction of the statistic features, in accordance with which, the image data were divided into different categories, thereby the categories of abnormal colors were distinct. By the use of this algorithm, the raisins of abnormal colors and ones with mottles were eliminated. The sorting rate was up to 98.6%, and the ratio of normal raisins to sorted grains was less than one eighth.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
Coupled cluster algorithms for networks of shared memory parallel processors
NASA Astrophysics Data System (ADS)
Bentz, Jonathan L.; Olson, Ryan M.; Gordon, Mark S.; Schmidt, Michael W.; Kendall, Ricky A.
2007-05-01
As the popularity of using SMP systems as the building blocks for high performance supercomputers increases, so too increases the need for applications that can utilize the multiple levels of parallelism available in clusters of SMPs. This paper presents a dual-layer distributed algorithm, using both shared-memory and distributed-memory techniques to parallelize a very important algorithm (often called the "gold standard") used in computational chemistry, the single and double excitation coupled cluster method with perturbative triples, i.e. CCSD(T). The algorithm is presented within the framework of the GAMESS [M.W. Schmidt, K.K. Baldridge, J.A. Boatz, S.T. Elbert, M.S. Gordon, J.J. Jensen, S. Koseki, N. Matsunaga, K.A. Nguyen, S. Su, T.L. Windus, M. Dupuis, J.A. Montgomery, General atomic and molecular electronic structure system, J. Comput. Chem. 14 (1993) 1347-1363]. (General Atomic and Molecular Electronic Structure System) program suite and the Distributed Data Interface [M.W. Schmidt, G.D. Fletcher, B.M. Bode, M.S. Gordon, The distributed data interface in GAMESS, Comput. Phys. Comm. 128 (2000) 190]. (DDI), however, the essential features of the algorithm (data distribution, load-balancing and communication overhead) can be applied to more general computational problems. Timing and performance data for our dual-level algorithm is presented on several large-scale clusters of SMPs.
Performance Comparison Of Evolutionary Algorithms For Image Clustering
NASA Astrophysics Data System (ADS)
Civicioglu, P.; Atasever, U. H.; Ozkan, C.; Besdok, E.; Karkinli, A. E.; Kesikoglu, A.
2014-09-01
Evolutionary computation tools are able to process real valued numerical sets in order to extract suboptimal solution of designed problem. Data clustering algorithms have been intensively used for image segmentation in remote sensing applications. Despite of wide usage of evolutionary algorithms on data clustering, their clustering performances have been scarcely studied by using clustering validation indexes. In this paper, the recently proposed evolutionary algorithms (i.e., Artificial Bee Colony Algorithm (ABC), Gravitational Search Algorithm (GSA), Cuckoo Search Algorithm (CS), Adaptive Differential Evolution Algorithm (JADE), Differential Search Algorithm (DSA) and Backtracking Search Optimization Algorithm (BSA)) and some classical image clustering techniques (i.e., k-means, fcm, som networks) have been used to cluster images and their performances have been compared by using four clustering validation indexes. Experimental test results exposed that evolutionary algorithms give more reliable cluster-centers than classical clustering techniques, but their convergence time is quite long.
Parallel Clustering Algorithms for Structured AMR
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Fusion and clustering algorithms for spatial data
NASA Astrophysics Data System (ADS)
Kuntala, Pavani
Spatial clustering is an approach for discovering groups of related data points in spatial data. Spatial clustering has attracted a lot of research attention due to various applications where it is needed. It holds practical importance in application domains such as geographic knowledge discovery, sensors, rare disease discovery, astronomy, remote sensing, and so on. The motivation for this work stems from the limitations of the existing spatial clustering methods. In most conventional spatial clustering algorithms, the similarity measurement mainly considers the geometric attributes. However, in many real applications, users are concerned about both the spatial and the non-spatial attributes. In conventional spatial clustering, the input data set is partitioned into several compact regions and data points that are similar to one another in their non-spatial attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. In this dissertation, a novel clustering methodology is proposed to explore the clustering problem within both spatial and non-spatial domains by employing a fusion-based approach. The goal is to optimize a given objective function in the spatial domain, while satisfying the constraint specified in the non- spatial attribute domain. Several experiments are conducted to provide insights into the proposed methodology. The algorithm first captures the spatial cores having the highest structure and then employs an iterative, heuristic mechanism to find the optimal number of spatial cores and non-spatial clusters that exist in the data. Such a fusion-based framework allows for the handling of data streams and provides a framework for comparing spatial clusters. The correctness and efficiency of the proposed clustering model is demonstrated on real world and synthetic data sets.
A dynamic clustering algorithm in wireless sensor networks
NASA Astrophysics Data System (ADS)
Wang, Rui; Liang, Yan; Pan, Quan; Wang, Quan; Cheng, Yongmei
2005-11-01
It is essential to prolong the lifetime of wireless sensor networks (WSN) via effective cooperation of its sensor nodes. Here, a dynamic clustering algorithm, named DCA, is presented to optimally and dynamically select the micro-sensor nodes to construct a dynamic sensor cluster at each time based on the integrated performance index including information acquirement and energy consumption. In distributed target tracking with WSN, the DCA can avoid the problem of "too frequent cluster head (CH) switches", save more than 80% energy and remain almost same tracking accuracy, compared with the information-driven sensor querying (IDSQ).
Genetic algorithm optimization of atomic clusters
Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E. |
1996-12-31
The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process.
Open cluster membership probability based on K-means clustering algorithm
NASA Astrophysics Data System (ADS)
El Aziz, Mohamed Abd; Selim, I. M.; Essam, A.
2016-05-01
In the field of galaxies images, the relative coordinate positions of each star with respect to all the other stars are adapted. Therefore the membership of star cluster will be adapted by two basic criterions, one for geometric membership and other for physical (photometric) membership. So in this paper, we presented a new method for the determination of open cluster membership based on K-means clustering algorithm. This algorithm allows us to efficiently discriminate the cluster membership from the field stars. To validate the method we applied it on NGC 188 and NGC 2266, membership stars in these clusters have been obtained. The color-magnitude diagram of the membership stars is significantly clearer and shows a well-defined main sequence and a red giant branch in NGC 188, which allows us to better constrain the cluster members and estimate their physical parameters. The membership probabilities have been calculated and compared to those obtained by the other methods. The results show that the K-means clustering algorithm can effectively select probable member stars in space without any assumption about the spatial distribution of stars in cluster or field. The similarity of our results is in a good agreement with results derived by previous works.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
A Cross Unequal Clustering Routing Algorithm for Sensor Network
NASA Astrophysics Data System (ADS)
Tong, Wang; Jiyi, Wu; He, Xu; Jinghua, Zhu; Munyabugingo, Charles
2013-08-01
In the routing protocol for wireless sensor network, the cluster size is generally fixed in clustering routing algorithm for wireless sensor network, which can easily lead to the "hot spot" problem. Furthermore, the majority of routing algorithms barely consider the problem of long distance communication between adjacent cluster heads that brings high energy consumption. Therefore, this paper proposes a new cross unequal clustering routing algorithm based on the EEUC algorithm. In order to solve the defects of EEUC algorithm, this algorithm calculating of competition radius takes the node's position and node's remaining energy into account to make the load of cluster heads more balanced. At the same time, cluster adjacent node is applied to transport data and reduce the energy-loss of cluster heads. Simulation experiments show that, compared with LEACH and EEUC, the proposed algorithm can effectively reduce the energy-loss of cluster heads and balance the energy consumption among all nodes in the network and improve the network lifetime
Random networks with tunable degree distribution and clustering
NASA Astrophysics Data System (ADS)
Volz, Erik
2004-11-01
We present an algorithm for generating random networks with arbitrary degree distribution and clustering (frequency of triadic closure). We use this algorithm to generate networks with exponential, power law, and Poisson degree distributions with variable levels of clustering. Such networks may be used as models of social networks and as a testable null hypothesis about network structure. Finally, we explore the effects of clustering on the point of the phase transition where a giant component forms in a random network, and on the size of the giant component. Some analysis of these effects is presented.
Improved Ant Colony Clustering Algorithm and Its Performance Study.
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
Tellaroli, Paola; Bazzi, Marco; Donato, Michele; Brazzale, Alessandra R.; Drăghici, Sorin
2016-01-01
Four of the most common limitations of the many available clustering methods are: i) the lack of a proper strategy to deal with outliers; ii) the need for a good a priori estimate of the number of clusters to obtain reasonable results; iii) the lack of a method able to detect when partitioning of a specific data set is not appropriate; and iv) the dependence of the result on the initialization. Here we propose Cross-clustering (CC), a partial clustering algorithm that overcomes these four limitations by combining the principles of two well established hierarchical clustering algorithms: Ward’s minimum variance and Complete-linkage. We validated CC by comparing it with a number of existing clustering methods, including Ward’s and Complete-linkage. We show on both simulated and real datasets, that CC performs better than the other methods in terms of: the identification of the correct number of clusters, the identification of outliers, and the determination of real cluster memberships. We used CC to cluster samples in order to identify disease subtypes, and on gene profiles, in order to determine groups of genes with the same behavior. Results obtained on a non-biological dataset show that the method is general enough to be successfully used in such diverse applications. The algorithm has been implemented in the statistical language R and is freely available from the CRAN contributed packages repository. PMID:27015427
A Distributed Flocking Approach for Information Stream Clustering Analysis
Cui, Xiaohui; Potok, Thomas E
2006-01-01
Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.
An Artificial Immune Univariate Marginal Distribution Algorithm
NASA Astrophysics Data System (ADS)
Zhang, Qingbin; Kang, Shuo; Gao, Junxiang; Wu, Song; Tian, Yanping
Hybridization is an extremely effective way of improving the performance of the Univariate Marginal Distribution Algorithm (UMDA). Owing to its diversity and memory mechanisms, artificial immune algorithm has been widely used to construct hybrid algorithms with other optimization algorithms. This paper proposes a hybrid algorithm which combines the UMDA with the principle of general artificial immune algorithm. Experimental results on deceptive function of order 3 show that the proposed hybrid algorithm can get more building blocks (BBs) than the UMDA.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039
a Distributed Polygon Retrieval Algorithm Using Mapreduce
NASA Astrophysics Data System (ADS)
Guo, Q.; Palanisamy, B.; Karimi, H. A.
2015-07-01
The burst of large-scale spatial terrain data due to the proliferation of data acquisition devices like 3D laser scanners poses challenges to spatial data analysis and computation. Among many spatial analyses and computations, polygon retrieval is a fundamental operation which is often performed under real-time constraints. However, existing sequential algorithms fail to meet this demand for larger sizes of terrain data. Motivated by the MapReduce programming model, a well-adopted large-scale parallel data processing technique, we present a MapReduce-based polygon retrieval algorithm designed with the objective of reducing the IO and CPU loads of spatial data processing. By indexing the data based on a quad-tree approach, a significant amount of unneeded data is filtered in the filtering stage and it reduces the IO overhead. The indexed data also facilitates querying the relationship between the terrain data and query area in shorter time. The results of the experiments performed in our Hadoop cluster demonstrate that our algorithm performs significantly better than the existing distributed algorithms.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
An algorithm on distributed mining association rules
NASA Astrophysics Data System (ADS)
Xu, Fan
2005-12-01
With the rapid development of the Internet/Intranet, distributed databases have become a broadly used environment in various areas. It is a critical task to mine association rules in distributed databases. The algorithms of distributed mining association rules can be divided into two classes. One is a DD algorithm, and another is a CD algorithm. A DD algorithm focuses on data partition optimization so as to enhance the efficiency. A CD algorithm, on the other hand, considers a setting where the data is arbitrarily partitioned horizontally among the parties to begin with, and focuses on parallelizing the communication. A DD algorithm is not always applicable, however, at the time the data is generated, it is often already partitioned. In many cases, it cannot be gathered and repartitioned for reasons of security and secrecy, cost transmission, or sheer efficiency. A CD algorithm may be a more appealing solution for systems which are naturally distributed over large expenses, such as stock exchange and credit card systems. An FDM algorithm provides enhancement to CD algorithm. However, CD and FDM algorithms are both based on net-structure and executing in non-shareable resources. In practical applications, however, distributed databases often are star-structured. This paper proposes an algorithm based on star-structure networks, which are more practical in application, have lower maintenance costs and which are more practical in the construction of the networks. In addition, the algorithm provides high efficiency in communication and good extension in parallel computation.
Incremental Clustering Algorithm For Earth Science Data Mining
Vatsavai, Raju
2009-01-01
Remote sensing data plays a key role in understanding the complex geographic phenomena. Clustering is a useful tool in discovering interesting patterns and structures within the multivariate geospatial data. One of the key issues in clustering is the specication of appropriate number of clusters, which is not obvious in many practical situations. In this paper we provide an extension of G-means algorithm which automatically learns the number of clusters present in the data and avoids over estimation of the number of clusters. Experimental evaluation on simulated and remotely sensed image data shows the effectiveness of our algorithm.
Color Distributions of 29 Galactic Globular Clusters
NASA Astrophysics Data System (ADS)
Sohn, Young-Jong; Byun, Yong-Ik; Yim, Hong-Suh; Rhee, Myung-Hyun; Chun, Mun-Suk
1998-06-01
U, B, and V CCD images are used to investigate the radial color gradients of twenty nine Galactic globular clusters - twenty two King type clusters and seven Post Core Collapse (PCC) clusters classified on their surface brightness distributions. For King type clusters, eight clusters show radial color gradients with redder center and seven clusters with bluer centers in (B-V). Seven King type clusters have redder centers in (U-B), and five King type clusters show radial color gradients with bluer center in the same color. Among seven PCC clusters, one cluster show a redder center and five clusters show bluer centers in (B-V). Two PCC clusters have redder centers in (U-B), four PCC clusters show radial color gradients with bluer centers in the same color. These results bring an evidence that the color gradient is not unique to PCC clusters with bluer center. >From the Pearson's correlation coefficient tests, we found the horizontal branch morphologies have weak correlations to the radial color gradients within globular clusters.
Clustering algorithms do not learn, but they can be learned
NASA Astrophysics Data System (ADS)
Brun, Marcel; Dougherty, Edward R.
2005-08-01
Pattern classification theory involves an error criterion, optimal classifiers, and a theory of learning. For clustering, there has historically been little theory; in particular, there has generally (but not always) been no learning. The key point is that clustering has not been grounded on a probabilistic theory. Recently, a clustering theory has been developed in the context of random sets. This paper discusses learning within that context, in particular, k- nearest-neighbor learning of clustering algorithms.
Clustering algorithms for Stokes space modulation format recognition.
Boada, Ricard; Borkowski, Robert; Monroy, Idelfonso Tafur
2015-06-15
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used for discriminating between dual polarization: BPSK, QPSK, 8-PSK, 8-QAM, and 16-QAM. We determine essential performance metrics for each clustering algorithm and modulation format under test: minimum required signal-to-noise ratio, detection accuracy and algorithm complexity. PMID:26193532
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
A scalable parallel graph coloring algorithm for distributed memory computers.
Bozdag, Doruk; Manne, Fredrik; Gebremedhin, Assefaw H.; Catalyurek, Umit; Boman, Erik Gunnar
2005-02-01
In large-scale parallel applications a graph coloring is often carried out to schedule computational tasks. In this paper, we describe a new distributed memory algorithm for doing the coloring itself in parallel. The algorithm operates in an iterative fashion; in each round vertices are speculatively colored based on limited information, and then a set of incorrectly colored vertices, to be recolored in the next round, is identified. Parallel speedup is achieved in part by reducing the frequency of communication among processors. Experimental results on a PC cluster using up to 16 processors show that the algorithm is scalable.
A systematic comparison of genome-scale clustering algorithms
2012-01-01
Background A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. Methods For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. Results Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. Conclusions Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further
Adaptive link selection algorithms for distributed estimation
NASA Astrophysics Data System (ADS)
Xu, Songcen; de Lamare, Rodrigo C.; Poor, H. Vincent
2015-12-01
This paper presents adaptive link selection algorithms for distributed estimation and considers their application to wireless sensor networks and smart grids. In particular, exhaustive search-based least mean squares (LMS) / recursive least squares (RLS) link selection algorithms and sparsity-inspired LMS / RLS link selection algorithms that can exploit the topology of networks with poor-quality links are considered. The proposed link selection algorithms are then analyzed in terms of their stability, steady-state, and tracking performance and computational complexity. In comparison with the existing centralized or distributed estimation strategies, the key features of the proposed algorithms are as follows: (1) more accurate estimates and faster convergence speed can be obtained and (2) the network is equipped with the ability of link selection that can circumvent link failures and improve the estimation performance. The performance of the proposed algorithms for distributed estimation is illustrated via simulations in applications of wireless sensor networks and smart grids.
The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis
NASA Astrophysics Data System (ADS)
Hoshen, Joseph
1997-08-01
In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension.
Zhu, Zheng; Ochoa, Andrew J; Katzgraber, Helmut G
2015-08-14
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine. PMID:26317743
Efficient Cluster Algorithm for Spin Glasses in Any Space Dimension
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Ochoa, Andrew J.; Katzgraber, Helmut G.
2015-08-01
Spin systems with frustration and disorder are notoriously difficult to study, both analytically and numerically. While the simulation of ferromagnetic statistical mechanical models benefits greatly from cluster algorithms, these accelerated dynamics methods remain elusive for generic spin-glass-like systems. Here, we present a cluster algorithm for Ising spin glasses that works in any space dimension and speeds up thermalization by at least one order of magnitude at temperatures where thermalization is typically difficult. Our isoenergetic cluster moves are based on the Houdayer cluster algorithm for two-dimensional spin glasses and lead to a speedup over conventional state-of-the-art methods that increases with the system size. We illustrate the benefits of the isoenergetic cluster moves in two and three space dimensions, as well as the nonplanar chimera topology found in the D-Wave Inc. quantum annealing machine.
MODELING THE METALLICITY DISTRIBUTION OF GLOBULAR CLUSTERS
Muratov, Alexander L.; Gnedin, Oleg Y. E-mail: ognedin@umich.ed
2010-08-01
Observed metallicities of globular clusters reflect physical conditions in the interstellar medium of their high-redshift host galaxies. Globular cluster systems in most large galaxies display bimodal color and metallicity distributions, which are often interpreted as indicating two distinct modes of cluster formation. The metal-rich and metal-poor clusters have systematically different locations and kinematics in their host galaxies. However, the red and blue clusters have similar internal properties, such as their masses, sizes, and ages. It is therefore interesting to explore whether both metal-rich and metal-poor clusters could form by a common mechanism and still be consistent with the bimodal distribution. We present such a model, which prescribes the formation of globular clusters semi-analytically using galaxy assembly history from cosmological simulations coupled with observed scaling relations for the amount and metallicity of cold gas available for star formation. We assume that massive star clusters form only during mergers of massive gas-rich galaxies and tune the model parameters to reproduce the observed distribution in the Galaxy. A wide, but not the entire, range of model realizations produces metallicity distributions consistent with the data. We find that early mergers of smaller hosts create exclusively blue clusters, whereas subsequent mergers of more massive galaxies create both red and blue clusters. Thus, bimodality arises naturally as the result of a small number of late massive merger events. This conclusion is not significantly affected by the large uncertainties in our knowledge of the stellar mass and cold gas mass in high-redshift galaxies. The fraction of galactic stellar mass locked in globular clusters declines from over 10% at z > 3 to 0.1% at present.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Clustering of Hadronic Showers with a Structural Algorithm
Charles, M.J.; /SLAC
2005-12-13
The internal structure of hadronic showers can be resolved in a high-granularity calorimeter. This structure is described in terms of simple components and an algorithm for reconstruction of hadronic clusters using these components is presented. Results from applying this algorithm to simulated hadronic Z-pole events in the SiD concept are discussed.
CCL: an algorithm for the efficient comparison of clusters
Hundt, R.; Schön, J. C.; Neelamraju, S.; Zagorac, J.; Jansen, M.
2013-01-01
The systematic comparison of the atomic structure of solids and clusters has become an important task in crystallography, chemistry, physics and materials science, in particular in the context of structure prediction and structure determination of nanomaterials. In this work, an efficient and robust algorithm for the comparison of cluster structures is presented, which is based on the mapping of the point patterns of the two clusters onto each other. This algorithm has been implemented as the module CCL in the structure visualization and analysis program KPLOT. PMID:23682193
A knowledge-based clustering algorithm driven by Gene Ontology.
Cheng, Jill; Cline, Melissa; Martin, John; Finkelstein, David; Awad, Tarif; Kulp, David; Siani-Rose, Michael A
2004-08-01
We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments. PMID:15468759
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Sampling Within k-Means Algorithm to Cluster Large Datasets
Bejarano, Jeremy; Bose, Koushiki; Brannan, Tyler; Thomas, Anita; Adragni, Kofi; Neerchal, Nagaraj; Ostrouchov, George
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
Sharma, Ashok; Podolsky, Robert; Zhao, Jieping; McIndoe, Richard A.
2009-01-01
Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting the cluster quality. Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high input/output costs involved and large distance matrices calculated, most of the algomerative clustering algorithms fail on large datasets (30 000 + genes/200 + arrays). In this article, we propose a new two-stage algorithm which partitions the high-dimensional space associated with microarray data using hyperplanes. The first stage is based on the Balanced Iterative Reducing and Clustering using Hierarchies algorithm with the second stage being a conventional k-means clustering technique. This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data. We compared the clustering results using the two-stage hyperplane algorithm with the conventional k-means algorithm from other available programs. Because, the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44 460 genes without failure and significantly decreases the time to complete when compared with popular k-means programs. The software was written in C# (.NET 1.1). Availability: The program is freely available and can be downloaded from http://www.amdcc.org/bioinformatics/bioinformatics.aspx. Contact: rmcindoe@mail.mcg.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19261720
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
Personalized PageRank Clustering: A graph clustering algorithm based on random walks
NASA Astrophysics Data System (ADS)
A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali
2013-11-01
Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.
Functional clustering algorithm for the analysis of dynamic network data
NASA Astrophysics Data System (ADS)
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms. PMID:22163905
Performance impact of dynamic parallelism on different clustering algorithms
NASA Astrophysics Data System (ADS)
DiMarco, Jeffrey; Taufer, Michela
2013-05-01
In this paper, we aim to quantify the performance gains of dynamic parallelism. The newest version of CUDA, CUDA 5, introduces dynamic parallelism, which allows GPU threads to create new threads, without CPU intervention, and adapt to its data. This effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance will be measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.
Development of clustering algorithms for Compressed Baryonic Matter experiment
NASA Astrophysics Data System (ADS)
Kozlov, G. E.; Ivanov, V. V.; Lebedev, A. A.; Vassiliev, Yu. O.
2015-05-01
A clustering problem for the coordinate detectors in the Compressed Baryonic Matter (CBM) experiment is discussed. Because of the high interaction rate and huge datasets to be dealt with, clustering algorithms are required to be fast and efficient and capable of processing events with high track multiplicity. At present there are two different approaches to the problem. In the first one each fired pad bears information about its charge, while in the second one a pad can or cannot be fired, thus rendering the separation of overlapping clusters a difficult task. To deal with the latter, two different clustering algorithms were developed, integrated into the CBMROOT software environment, and tested with various types of simulated events. Both of them are found to be highly efficient and accurate.
NCUBE - A clustering algorithm based on a discretized data space
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
Fast clustering algorithm for codebook production in image vector quantization
NASA Astrophysics Data System (ADS)
Al-Otum, Hazem M.
2001-04-01
In this paper, a fast clustering algorithm (FCA) is proposed to be implemented in vector quantization codebook production. This algorithm gives the ability to avoid iterative averaging of vectors and is based on collecting vectors with similar or closely similar characters to produce corresponding clusters. FCA gives an increase in peak signal-to-noise ratio (PSNR) about 0.3 - 1.1 dB, over the LBG algorithm and reduces the computational cost for codebook production (10% - 60%) at different bit rates. Here, two FCA modifications are proposed: FCA with limited cluster size 1& (FCA-LCS1 and FCA-LCS2, respectively). FCA- LCS1 tends to subdivide large clusters into smaller ones while FCA-LCS2 reduces a predetermined threshold by a step to reach the required cluster size. The FCA-LCS1 and FCA- LCS2 give an increase in PSNR of about 0.9 - 1.0 and 0.9 - 1.1 dB, respectively, over the FCA algorithm, at the expense of about 15% - 25% and 18% - 28% increase in the output codebook size.
Particle flow reconstruction based on the directed tree clustering algorithm
Chakraborty, D.; Lima, J. G. R.; McIntosh, R.; Zutshi, V.
2006-10-27
We present the status of particle flow algorithm development at Northern Illinois University. A key element in our approach is the calorimeter-based directed tree clustering algorithm. We have attempted to identify and tackle the essential challenges and analyze the effect of several different approaches to the reconstruction of jet energies and the Z-boson mass. A number of possibilities have been studied, such as analog vs. digital energy measurement, hit density-based clustering and the use of single or multiple energy thresholds. We plan to use this PFA-based reconstruction to compare some of the proposed detector technologies and geometries.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large
A Task-parallel Clustering Algorithm for Structured AMR
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
Six clustering algorithms applied to the WAIS-R: the problem of dissimilar cluster results.
Fraboni, M; Cooper, D
1989-11-01
Clusterings of the Wechsler Adult Intelligence Scale-Revised subtests were obtained from the application of six hierarchical clustering methods (N = 113). These sets of clusters were compared for similarities using the Rand index. The calculated indices suggested similarities of cluster group membership between the Complete Linkage and Centroid methods; Complete Linkage and Ward's methods; Centroid and Ward's methods; and Single Linkage and Average Linkage Between Groups methods. Cautious use of single clustering methods is implied, though the authors suggest some advantages of knowing specific similarities and differences. If between-method comparisons consistently reveal similar cluster membership, a choice could be made from those algorithms that tend to produce similar partitions, thereby enhancing cluster interpretation. PMID:2613904
Characterising superclusters with the galaxy cluster distribution
NASA Astrophysics Data System (ADS)
Chon, Gayoung; Böhringer, Hans; Collins, Chris A.; Krause, Martin
2014-07-01
Superclusters are the largest observed matter density structures in the Universe. Recently, we presented the first supercluster catalogue constructed with a well-defined selection function based on the X-ray flux-limited cluster survey, REFLEX II. To construct the sample we proposed a concept to find large objects with a minimum overdensity such that it can be expected that most of their mass will collapse in the future. The main goal is to provide support for our concept here by using simulation that we can, on the basis of our observational sample of X-ray clusters, construct a supercluster sample defined by a certain minimum overdensity. On this sample we also test how superclusters trace the underlying dark matter distribution. Our results confirm that an overdensity in the number of clusters is tightly correlated with an overdensity of the dark matter distribution. This enables us to define superclusters within which most of the mass will collapse in the future. We also obtain first-order mass estimates of superclusters on the basis of the properties of the member clusters. We also show that in this context the ratio of the cluster number density and dark matter mass density is consistent with the theoretically expected cluster bias. Our previous work provided evidence that superclusters are a special environment in which the density structures of the dark matter grow differently from those in the field, as characterised by the X-ray luminosity function. Here we confirm for the first time that this originates from a top-heavy mass function at high statistical significance that is provided by a Kolmogorov-Smirnov test. We also find in close agreement with observations that the superclusters only occupy a small volume of a few per cent, but contain more than half of the clusters in the present-day Universe.
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.
Li, Han; Li, Chun; Hu, Jie; Fan, Xiaodan
2015-01-01
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance. PMID:26671802
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer, Hans; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U., ICG /North Carolina U. /Chicago U., Astron. Astrophys. Ctr. /Chicago U., EFI /Michigan U. /Fermilab /Princeton U. Observ. /Garching, Max Planck Inst., MPE /Pittsburgh U. /Tokyo U., ICRR /Baltimore, Space Telescope Sci. /Penn State U. /Chicago U. /Stavropol, Astrophys. Observ. /Heidelberg, Max Planck Inst. Astron. /INI, SAO
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster. However, if we
Adaptive clustering algorithm for community detection in complex networks.
Ye, Zhenqing; Hu, Songnian; Yu, Jun
2008-10-01
Community structure is common in various real-world networks; methods or algorithms for detecting such communities in complex networks have attracted great attention in recent years. We introduced a different adaptive clustering algorithm capable of extracting modules from complex networks with considerable accuracy and robustness. In this approach, each node in a network acts as an autonomous agent demonstrating flocking behavior where vertices always travel toward their preferable neighboring groups. An optimal modular structure can emerge from a collection of these active nodes during a self-organization process where vertices constantly regroup. In addition, we show that our algorithm appears advantageous over other competing methods (e.g., the Newman-fast algorithm) through intensive evaluation. The applications in three real-world networks demonstrate the superiority of our algorithm to find communities that are parallel with the appropriate organization in reality. PMID:18999501
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
A Likelihood-Based SLIC Superpixel Algorithm for SAR Images Using Generalized Gamma Distribution
Zou, Huanxin; Qin, Xianxiang; Zhou, Shilin; Ji, Kefeng
2016-01-01
The simple linear iterative clustering (SLIC) method is a recently proposed popular superpixel algorithm. However, this method may generate bad superpixels for synthetic aperture radar (SAR) images due to effects of speckle and the large dynamic range of pixel intensity. In this paper, an improved SLIC algorithm for SAR images is proposed. This algorithm exploits the likelihood information of SAR image pixel clusters. Specifically, a local clustering scheme combining intensity similarity with spatial proximity is proposed. Additionally, for post-processing, a local edge-evolving scheme that combines spatial context and likelihood information is introduced as an alternative to the connected components algorithm. To estimate the likelihood information of SAR image clusters, we incorporated a generalized gamma distribution (GГD). Finally, the superiority of the proposed algorithm was validated using both simulated and real-world SAR images. PMID:27438840
A Likelihood-Based SLIC Superpixel Algorithm for SAR Images Using Generalized Gamma Distribution.
Zou, Huanxin; Qin, Xianxiang; Zhou, Shilin; Ji, Kefeng
2016-01-01
The simple linear iterative clustering (SLIC) method is a recently proposed popular superpixel algorithm. However, this method may generate bad superpixels for synthetic aperture radar (SAR) images due to effects of speckle and the large dynamic range of pixel intensity. In this paper, an improved SLIC algorithm for SAR images is proposed. This algorithm exploits the likelihood information of SAR image pixel clusters. Specifically, a local clustering scheme combining intensity similarity with spatial proximity is proposed. Additionally, for post-processing, a local edge-evolving scheme that combines spatial context and likelihood information is introduced as an alternative to the connected components algorithm. To estimate the likelihood information of SAR image clusters, we incorporated a generalized gamma distribution (GГD). Finally, the superiority of the proposed algorithm was validated using both simulated and real-world SAR images. PMID:27438840
Biologically supervised hierarchical clustering algorithms for gene expression data.
Boratyn, Grzegorz M; Datta, Susmita; Datta, Somnath
2006-01-01
Cluster analysis has become a standard part of gene expression analysis. In this paper, we propose a novel semi-supervised approach that offers the same flexibility as that of a hierarchical clustering. Yet it utilizes, along with the experimental gene expression data, common biological information about different genes that is being complied at various public, Web accessible databases. We argue that such an approach is inherently superior than the standard unsupervised approach of grouping genes based on expression data alone. It is shown that our biologically supervised methods produce better clustering results than the corresponding unsupervised methods as judged by the distance from the model temporal profiles. R-codes of the clustering algorithm are available from the authors upon request. PMID:17947147
NASA Astrophysics Data System (ADS)
Liu, Lifeng; Sun, Sam Zandong; Yu, Hongyu; Yue, Xingtong; Zhang, Dong
2016-06-01
Considering the fact that the fluid distribution in carbonate reservoir is very complicated and the existing fluid prediction methods are not able to produce ideal predicted results, this paper proposes a new fluid identification method in carbonate reservoir based on the modified Fuzzy C-Means (FCM) Clustering algorithm. Both initialization and globally optimum cluster center are produced by Chaotic Quantum Particle Swarm Optimization (CQPSO) algorithm, which can effectively avoid the disadvantage of sensitivity to initial values and easily falling into local convergence in the traditional FCM Clustering algorithm. Then, the modified algorithm is applied to fluid identification in the carbonate X area in Tarim Basin of China, and a mapping relation between fluid properties and pre-stack elastic parameters will be built in multi-dimensional space. It has been proven that this modified algorithm has a good ability of fuzzy cluster and its total coincidence rate of fluid prediction reaches 97.10%. Besides, the membership of different fluids can be accumulated to obtain respective probability, which can evaluate the uncertainty in fluid identification result.
A new clustering algorithm for scanning electron microscope images
NASA Astrophysics Data System (ADS)
Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad
2016-04-01
A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.
LYDIAN: An Extensible Educational Animation Environment for Distributed Algorithms
ERIC Educational Resources Information Center
Koldehofe, Boris; Papatriantafilou, Marina; Tsigas, Philippas
2006-01-01
LYDIAN is an environment to support the teaching and learning of distributed algorithms. It provides a collection of distributed algorithms as well as continuous animations. Users can combine algorithms and animations with arbitrary network structures defining the interconnection and behavior of the distributed algorithm. Further, it facilitates…
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-01
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters. PMID:26327507
NASA Astrophysics Data System (ADS)
Ball, R. C.; Lee, J. R.
1996-03-01
We prove that a new, irreversible growth algorithm, Non-Deletion Reaction-Limited Cluster-cluster Aggregation (NDRLCA), produces equilibrium Branched Polymers, expected to exhibit Lattice Animal statistics [1]. We implement NDRLCA, off-lattice, as a computer simulation for embedding dimension d=2 and 3, obtaining values for critical exponents, fractal dimension D and cluster mass distribution exponent tau: d=2, D≈ 1.53± 0.05, tau = 1.09± 0.06; d=3, D=1.96± 0.04, tau =1.50± 0.04 in good agreement with theoretical LA values. The simulation results do not support recent suggestions [2] that BPs may be in the same universality class as percolation. We also obtain values for a model-dependent critical “fugacity”, z_c and investigate the finite-size effects of our simulation, quantifying notions of “inbreeding” that occur in this algorithm. Finally we use an extension of the NDRLCA proof to show that standard Reaction-Limited Cluster-cluster Aggregation is very unlikely to be in the same universality class as Branched Polymers/Lattice Animals unless the backnone dimension for the latter is considerably less than the published value.
Pruning Neural Networks with Distribution Estimation Algorithms
Cantu-Paz, E
2003-01-15
This paper describes the application of four evolutionary algorithms to the pruning of neural networks used in classification problems. Besides of a simple genetic algorithm (GA), the paper considers three distribution estimation algorithms (DEAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the DEAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a feed forward neural network trained with standard back propagation and public-domain and artificial data sets. The pruned networks seemed to have better or equal accuracy than the original fully-connected networks. Only in a few cases, pruning resulted in less accurate networks. We found few differences in the accuracy of the networks pruned by the four EAs, but found important differences in the execution time. The results suggest that a simple GA with a small population might be the best algorithm for pruning networks on the data sets we tested.
Algorithm for systematic peak extraction from atomic pair distribution functions.
Granlund, L; Billinge, S J L; Duxbury, P M
2015-07-01
The study presents an algorithm, ParSCAPE, for model-independent extraction of peak positions and intensities from atomic pair distribution functions (PDFs). It provides a statistically motivated method for determining parsimony of extracted peak models using the information-theoretic Akaike information criterion (AIC) applied to plausible models generated within an iterative framework of clustering and chi-square fitting. All parameters the algorithm uses are in principle known or estimable from experiment, though careful judgment must be applied when estimating the PDF baseline of nanostructured materials. ParSCAPE has been implemented in the Python program SrMise. Algorithm performance is examined on synchrotron X-ray PDFs of 16 bulk crystals and two nanoparticles using AIC-based multimodeling techniques, and particularly the impact of experimental uncertainties on extracted models. It is quite resistant to misidentification of spurious peaks coming from noise and termination effects, even in the absence of a constraining structural model. Structure solution from automatically extracted peaks using the Liga algorithm is demonstrated for 14 crystals and for C60. Special attention is given to the information content of the PDF, theory and practice of the AIC, as well as the algorithm's limitations. PMID:26131896
Mapping cultivable land from satellite imagery with clustering algorithms
NASA Astrophysics Data System (ADS)
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Synchronous Firefly Algorithm for Cluster Head Selection in WSN.
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Synchronous Firefly Algorithm for Cluster Head Selection in WSN
Baskaran, Madhusudhanan; Sadagopan, Chitra
2015-01-01
Wireless Sensor Network (WSN) consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs) and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH) offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC. PMID:26495431
Large Data Visualization on Distributed Memory Mulit-GPU Clusters
Childs, Henry R.
2010-03-01
Data sets of immense size are regularly generated on large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualization on standard workstations is now commonplace. One solution to this problem is to employ a 'visualization cluster,' a small to medium scale cluster dedicated to performing visualization and analysis of massive data sets generated on larger scale supercomputers. These clusters are designed to fit a different need than traditional supercomputers, and therefore their design mandates different hardware choices, such as increased memory, and more recently, graphics processing units (GPUs). While there has been much previous work on distributed memory visualization as well as GPU visualization, there is a relative dearth of algorithms which effectively use GPUs at a large scale in a distributed memory environment. In this work, we study a common visualization technique in a GPU-accelerated, distributed memory setting, and present performance characteristics when scaling to extremely large data sets.
Advanced defect detection algorithm using clustering in ultrasonic NDE
NASA Astrophysics Data System (ADS)
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
Non-equilibrium relaxation analysis in cluster algorithms
NASA Astrophysics Data System (ADS)
Nonomura, Yoshihiko
2014-03-01
In Monte Carlo study of phase transitions, the critical slowing down has been a serious problem. In order to overcome this difficulty, two kinds of approaches have been proposed. One is the cluster algorithms, where global update scheme based on a percolation theory is introduced in order to refrain from the power-law behavior at the critical point. Another is the non-equilibrium relaxation method, where the power-law critical relaxation process is analyzed by the dynamical scaling theory in order to refrain from time-consuming equilibration. Then, the next step is to fuse these two approaches -- to investigate phase transitions with early-stage relaxation process of cluster algorithms. Since the dynamical scaling theory does not hold in cluster algorithms in principle, such attempt had been considered impossible. In the present talk we show that such fusion is actually possible using an empirical scaling form obtained from the 2D Ising models instead of the dynamical scaling theory. Applications to the q >= 3 Potts models, +/- J Ising models etc. will also be explained in the presentation.
Comparison of cluster expansion fitting algorithms for interactions at surfaces
NASA Astrophysics Data System (ADS)
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
An improved distance matrix computation algorithm for multicore clusters.
Al-Neama, Mohammed W; Reda, Naglaa M; Ghaleb, Fayed F M
2014-01-01
Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations. DistVect algorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes DistVect1 as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulates DistVect1 vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI. PMID:25013779
ICANP2: Isoenergetic cluster algorithm for NP-complete Problems
NASA Astrophysics Data System (ADS)
Zhu, Zheng; Fang, Chao; Katzgraber, Helmut G.
NP-complete optimization problems with Boolean variables are of fundamental importance in computer science, mathematics and physics. Most notably, the minimization of general spin-glass-like Hamiltonians remains a difficult numerical task. There has been a great interest in designing efficient heuristics to solve these computationally difficult problems. Inspired by the rejection-free isoenergetic cluster algorithm developed for Ising spin glasses, we present a generalized cluster update that can be applied to different NP-complete optimization problems with Boolean variables. The cluster updates allow for a wide-spread sampling of phase space, thus speeding up optimization. By carefully tuning the pseudo-temperature (needed to randomize the configurations) of the problem, we show that the method can efficiently tackle problems on topologies with a large site-percolation threshold. We illustrate the ICANP2 heuristic on paradigmatic optimization problems, such as the satisfiability problem and the vertex cover problem.
An algorithm for point cluster generalization based on the Voronoi diagram
NASA Astrophysics Data System (ADS)
Yan, Haowen; Weibel, Robert
2008-08-01
This paper presents an algorithm for point cluster generalization. Four types of information, i.e. statistical, thematic, topological, and metric information are considered, and measures are selected to describe corresponding types of information quantitatively in the algorithm, i.e. the number of points for statistical information, the importance value for thematic information, the Voronoi neighbors for topological information, and the distribution range and relative local density for metric information. Based on these measures, an algorithm for point cluster generalization is developed. Firstly, point clusters are triangulated and a border polygon of the point clusters is obtained. By the border polygon, some pseudo points are added to the original point clusters to form a new point set and a range polygon that encloses all original points is constructed. Secondly, the Voronoi polygons of the new point set are computed in order to obtain the so-called relative local density of each point. Further, the selection probability of each point is computed using its relative local density and importance value, and then mark those will-be-deleted points as 'deleted' according to their selection probabilities and Voronoi neighboring relations. Thirdly, if the number of retained points does not satisfy that computed by the Radical Law, physically delete the points marked as 'deleted' forming a new point set, and the second step is repeated; else physically deleted pseudo points and the points marked as 'deleted', and the generalized point clusters are achieved. Owing to the use of the Voronoi diagram the algorithm is parameter free and fully automatic. As our experiments show, it can be used in the generalization of point features arranged in clusters such as thematic dot maps and control points on cartographic maps.
NIC-based Reduction Algorithms for Large-scale Clusters
Petrini, F; Moody, A T; Fernandez, J; Frachtenberg, E; Panda, D K
2004-07-30
Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous algorithms limit processing to the host CPU, we utilize the programmable processors and local memory available on modern cluster network interface cards (NICs) to explore a new dimension in the design of reduction algorithms. In this paper, we present the benefits and challenges, design issues and solutions, analytical models, and experimental evaluations of a family of NIC-based reduction algorithms. Performance and scalability evaluations were conducted on the ASCI Linux Cluster (ALC), a 960-node, 1920-processor machine at Lawrence Livermore National Laboratory, which uses the Quadrics QsNet interconnect. We find NIC-based reductions on modern interconnects to be more efficient than host-based implementations in both scalability and consistency. In particular, at large-scale--1812 processes--NIC-based reductions of small integer and floating-point arrays provided respective speedups of 121% and 39% over the host-based, production-level MPI implementation.
Algorithm-dependent fault tolerance for distributed computing
P. D. Hough; M. e. Goldsby; E. J. Walsh
2000-02-01
Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.
Finding reproducible cluster partitions for the k-means algorithm
2013-01-01
K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085
Einstein imaging observations of clusters with a bimodal mass distribution
NASA Technical Reports Server (NTRS)
Forman, W.; Bechtold, J.; Blair, W.; Giacconi, R.; Van Speybroeck, L.; Jones, C.
1981-01-01
Einstein imaging observations of four X-ray clusters of galaxies characterized by a double X-ray surface brightness and thus mass distribution are presented. The clusters A98, A115, A1750 and SC 0627-54 were found to exhibit two enhancements in their X-ray surface brightness distributions in observations made with the Einstein Imaging Proportional Counter. Calculations of the probability that the clusters represent chance superpositions indicate that the double clusters are physically associated. The radial distributions of the components are inconsistent with those of single point sources, and have been used to derive cluster luminosities which are typical of rich clusters. Masses of the subclusters are also found to be typical of bound and virialized clusters with gas contributing 10%. Within the framework of the hierarchical theory of galactic clustering, the double clusters are suggested to represent an intermediate evolutionary stage before the merger of subclusters into a relaxed Coma-type cluster.
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
NASA Astrophysics Data System (ADS)
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Distributed cluster management techniques for unattended ground sensor networks
NASA Astrophysics Data System (ADS)
Essawy, Magdi A.; Stelzig, Chad A.; Bevington, James E.; Minor, Sharon
2005-05-01
Smart Sensor Networks are becoming important target detection and tracking tools. The challenging problems in such networks include the sensor fusion, data management and communication schemes. This work discusses techniques used to distribute sensor management and multi-target tracking responsibilities across an ad hoc, self-healing cluster of sensor nodes. Although miniaturized computing resources possess the ability to host complex tracking and data fusion algorithms, there still exist inherent bandwidth constraints on the RF channel. Therefore, special attention is placed on the reduction of node-to-node communications within the cluster by minimizing unsolicited messaging, and distributing the sensor fusion and tracking tasks onto local portions of the network. Several challenging problems are addressed in this work including track initialization and conflict resolution, track ownership handling, and communication control optimization. Emphasis is also placed on increasing the overall robustness of the sensor cluster through independent decision capabilities on all sensor nodes. Track initiation is performed using collaborative sensing within a neighborhood of sensor nodes, allowing each node to independently determine if initial track ownership should be assumed. This autonomous track initiation prevents the formation of duplicate tracks while eliminating the need for a central "management" node to assign tracking responsibilities. Track update is performed as an ownership node requests sensor reports from neighboring nodes based on track error covariance and the neighboring nodes geo-positional location. Track ownership is periodically recomputed using propagated track states to determine which sensing node provides the desired coverage characteristics. High fidelity multi-target simulation results are presented, indicating the distribution of sensor management and tracking capabilities to not only reduce communication bandwidth consumption, but to also
Dynamically Incremental K-means++ Clustering Algorithm Based on Fuzzy Rough Set Theory
NASA Astrophysics Data System (ADS)
Li, Wei; Wang, Rujing; Jia, Xiufang; Jiang, Qing
Being classic K-means++ clustering algorithm only for static data, dynamically incremental K-means++ clustering algorithm (DK-Means++) is presented based on fuzzy rough set theory in this paper. Firstly, in DK-Means++ clustering algorithm, the formula of similar degree is improved by weights computed by using of the important degree of attributes which are reduced on the basis of rough fuzzy set theory. Secondly, new data only need match granular which was clustered by K-means++ algorithm or seldom new data is clustered by classic K-means++ algorithm in global data. In this way, that all data is re-clustered each time in dynamic data set is avoided, so the efficiency of clustering is improved. Throughout our experiments showing, DK-Means++ algorithm can objectively and efficiently deal with clustering problem of dynamically incremental data.
A new detection algorithm for microcalcification clusters in mammographic screening
NASA Astrophysics Data System (ADS)
Xie, Weiying; Ma, Yide; Li, Yunsong
2015-05-01
A novel approach for microcalcification clusters detection is proposed. At the first time, we make a short analysis of mammographic images with microcalcification lesions to confirm these lesions have much greater gray values than normal regions. After summarizing the specific feature of microcalcification clusters in mammographic screening, we make more focus on preprocessing step including eliminating the background, image enhancement and eliminating the pectoral muscle. In detail, Chan-Vese Model is used for eliminating background. Then, we do the application of combining morphology method and edge detection method. After the AND operation and Sobel filter, we use Hough Transform, it can be seen that the result have outperformed for eliminating the pectoral muscle which is approximately the gray of microcalcification. Additionally, the enhancement step is achieved by morphology. We make effort on mammographic image preprocessing to achieve lower computational complexity. As well known, it is difficult to robustly achieve mammograms analysis due to low contrast between normal and lesion tissues, there are also much noise in such images. After a serious preprocessing algorithm, a method based on blob detection is performed to microcalcification clusters according their specific features. The proposed algorithm has employed Laplace operator to improve Difference of Gaussians (DoG) function in terms of low contrast images. A preliminary evaluation of the proposed method performs on a known public database namely MIAS, rather than synthetic images. The comparison experiments and Cohen's kappa coefficients all demonstrate that our proposed approach can potentially obtain better microcalcification clusters detection results in terms of accuracy, sensitivity and specificity.
Clustering based on conditional distributions in an auxiliary space.
Sinkkonen, Janne; Kaski, Samuel
2002-01-01
We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associated auxiliary random variable over a discrete auxiliary space. Assuming that variation in the auxiliary space is meaningful, categories will emphasize similarly meaningful aspects of the primary space. From a data set consisting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implicitly estimated) distributions of the auxiliary data, conditioned on the primary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learning is introduced for learning the categories. Minimizing the distortion criterion turns out to be equivalent to maximizing the mutual information between the categories and the auxiliary data. In addition, connections to density estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DNA chips, with biological knowledge about the functional classes of the genes as the auxiliary data. PMID:11747539
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Yang, Jinfeng; Xiao, Yong; Wang, Jiabing; Ma, Qianli; Shen, Yanhua
2015-01-01
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. PMID:25861252
HST Imaging of the Globular Clusters in the Formax Cluster: Color and Luminosity Distributions
NASA Technical Reports Server (NTRS)
Grillmair, C. J.; Forbes, D. A.; Brodie, J.; Elson, R.
1998-01-01
We examine the luminosity and B - I color distribution of globular clusters for three early-type galaxies in the Fornax cluster using imaging data from the Wide Field/Planetary Camera 2 on the Hubble Space Telescope.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various conditions were collected in conjunction with JSC postural control studies using a Tilt-Translation Device (TTD). The University of West Florida proposed applying the Fuzzy C-Means Clustering (FCM) Algorithms to this data with a view towards identifying various states and stages. Data supplied by NASA/JSC were submitted to the FCM algorithms in an attempt to identify and characterize cluster substructure in a mixed ensemble of pre- and post-adaptational TTD data. Following several unsuccessful trials with FCM using a full 11 dimensional data set, a set of two channels (features) were found to enable FCM to separate pre- from post-adaptational TTD data. The main conclusions are that: (1) FCM seems able to separate pre- from post-TTD subject no. 2 on the one trial that was used, but only in certain subintervals of time; and (2) Channels 2 (right rear transducer force) and 8 (hip sway bar) contain better discrimination information than other supersets and combinations of the data that were tried so far.
jClustering, an open framework for the development of 4D clustering algorithms.
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
Applying various algorithms for species distribution modelling.
Li, Xinhai; Wang, Yuan
2013-06-01
Species distribution models have been used extensively in many fields, including climate change biology, landscape ecology and conservation biology. In the past 3 decades, a number of new models have been proposed, yet researchers still find it difficult to select appropriate models for data and objectives. In this review, we aim to provide insight into the prevailing species distribution models for newcomers in the field of modelling. We compared 11 popular models, including regression models (the generalized linear model, the generalized additive model, the multivariate adaptive regression splines model and hierarchical modelling), classification models (mixture discriminant analysis, the generalized boosting model, and classification and regression tree analysis) and complex models (artificial neural network, random forest, genetic algorithm for rule set production and maximum entropy approaches). Our objectives are: (i) to compare the strengths and weaknesses of the models, their characteristics and identify suitable situations for their use (in terms of data type and species-environment relationships) and (ii) to provide guidelines for model application, including 3 steps: model selection, model formulation and parameter estimation. PMID:23731809
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Radial distribution of metallicity in the LMC cluster systems
NASA Technical Reports Server (NTRS)
Kontizas, M.; Kontizas, E.; Michalitsianos, A. G.
1993-01-01
New determinations of the deprojected distances to the galaxy center for 94 star clusters and their metal abundances are used to investigate the variation of metallicity across the two LMC star cluster systems (Kontizas et al. 1990). A systematic radial trend of metallicity is observed in the extended outer cluster system, the outermost clusters being significantly metal poorer than the more central ones, with the exception of six clusters (which might lie out of the plane of the cluster system) out of 77. A radial metallicity gradient has been found, qualitatively comparable to that of the Milky Way for its system of the old disk clusters. If the six clusters are taken into consideration then the outer cluster system is well mixed up to 8 kpc. The spatial distribution of metallicities for the inner LMC cluster system, consisting of very young globulars does not show a systematic radial trend; they are all metal rich.
Learning Based Approach for Optimal Clustering of Distributed Program's Call Flow Graph
NASA Astrophysics Data System (ADS)
Abofathi, Yousef; Zarei, Bager; Parsa, Saeed
Optimal clustering of call flow graph for reaching maximum concurrency in execution of distributable components is one of the NP-Complete problems. Learning automatas (LAs) are search tools which are used for solving many NP-Complete problems. In this paper a learning based algorithm is proposed to optimal clustering of call flow graph and appropriate distributing of programs in network level. The algorithm uses learning feature of LAs to search in state space. It has been shown that the speed of reaching to solution increases remarkably using LA in search process, and it also prevents algorithm from being trapped in local minimums. Experimental results show the superiority of proposed algorithm over others.
GX-Means: A model-based divide and merge algorithm for geospatial image clustering
Vatsavai, Raju; Symons, Christopher T; Chandola, Varun; Jun, Goo
2011-01-01
One of the practical issues in clustering is the specification of the appropriate number of clusters, which is not obvious when analyzing geospatial datasets, partly because they are huge (both in size and spatial extent) and high dimensional. In this paper we present a computationally efficient model-based split and merge clustering algorithm that incrementally finds model parameters and the number of clusters. Additionally, we attempt to provide insights into this problem and other data mining challenges that are encountered when clustering geospatial data. The basic algorithm we present is similar to the G-means and X-means algorithms; however, our proposed approach avoids certain limitations of these well-known clustering algorithms that are pertinent when dealing with geospatial data. We compare the performance of our approach with the G-means and X-means algorithms. Experimental evaluation on simulated data and on multispectral and hyperspectral remotely sensed image data demonstrates the effectiveness of our algorithm.
Applying Social Networking and Clustering Algorithms to Galaxy Groups in ALFALFA
NASA Astrophysics Data System (ADS)
Bramson, Ali; Wilcots, E. M.
2012-01-01
Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clustering algorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.
Dynamic Layered Dual-Cluster Heads Routing Algorithm Based on Krill Herd Optimization in UWSNs.
Jiang, Peng; Feng, Yang; Wu, Feng; Yu, Shanen; Xu, Huan
2016-01-01
Aimed at the limited energy of nodes in underwater wireless sensor networks (UWSNs) and the heavy load of cluster heads in clustering routing algorithms, this paper proposes a dynamic layered dual-cluster routing algorithm based on Krill Herd optimization in UWSNs. Cluster size is first decided by the distance between the cluster head nodes and sink node, and a dynamic layered mechanism is established to avoid the repeated selection of the same cluster head nodes. Using Krill Herd optimization algorithm selects the optimal and second optimal cluster heads, and its Lagrange model directs nodes to a high likelihood area. It ultimately realizes the functions of data collection and data transition. The simulation results show that the proposed algorithm can effectively decrease cluster energy consumption, balance the network energy consumption, and prolong the network lifetime. PMID:27589744
NASA Astrophysics Data System (ADS)
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Evaluation of particle clustering algorithms in the prediction of brownout dust clouds
NASA Astrophysics Data System (ADS)
Govindarajan, Bharath Madapusi
2011-07-01
A study of three Lagrangian particle clustering methods has been conducted with application to the problem of predicting brownout dust clouds that develop when rotorcraft land over surfaces covered with loose sediment. A significant impediment in performing such particle modeling simulations is the extremely large number of particles needed to obtain dust clouds of acceptable fidelity. Computing the motion of each and every individual sediment particle in a dust cloud (which can reach into tens of billions per cubic meter) is computationally prohibitive. The reported work involved the development of computationally efficient clustering algorithms that can be applied to the simulation of dilute gas-particle suspensions at low Reynolds numbers of the relative particle motion. The Gaussian distribution, k-means and Osiptsov's clustering methods were studied in detail to highlight the nuances of each method for a prototypical flow field that mimics the highly unsteady, two-phase vortical particle flow obtained when rotorcraft encounter brownout conditions. It is shown that although clustering algorithms can be problem dependent and have bounds of applicability, they offer the potential to significantly reduce computational costs while retaining the overall accuracy of a brownout dust cloud solution.
Study on 2D random medium inversion algorithm based on Fuzzy C-means Clustering theory
NASA Astrophysics Data System (ADS)
Xu, Z.; Zhu, P.; Gu, Y.; Yang, X.; Jiang, J.
2015-12-01
Abstract: In seismic exploration for metal deposits, the traditional seismic inversion method based on layered homogeneous medium theory seems difficult to inverse small scale inhomogeneity and spatial variation of the actual medium. The reason is that physical properties of actual medium are more likely random distribution rather than layered. Thus, it is necessary to investigate a random medium inversion algorithm. The velocity of 2D random medium can be described as a function of five parameters: the background velocity (V0), the standard deviation of velocity (σ), the horizontal and vertical autocorrelation lengths (A and B), and the autocorrelation angle (θ). In this study, we propose an inversion algorithm for random medium based on the Fuzzy C-means Clustering (FCM) theory, whose basic idea is that FCM is used to control the inversion process to move forward to the direction we desired by clustering the estimated parameters into groups. Our method can be divided into three steps: firstly, the three parameters (A, B, θ) are estimated from 2D post-stack seismic data using the non-stationary random medium parameter estimation method, and then the estimated parameters are clustered to different groups according to FCM; secondly, the initial random medium model is constructed with clustered groups and the rest two parameters (V0 and σ) obtained from the well logging data; at last, inversion of the random medium are conducted to obtain velocity, impedance and random medium parameters using the Conjugate Gradient Method. The inversion experiments of synthetic seismic data show that the velocity models inverted by our algorithm are close to the real velocity distribution and the boundary of different media can be distinguished clearly.Key words: random medium, inversion, FCM, parameter estimation
Algorithm to extract the spanning clusters and calculate conductivity in strip geometries
NASA Astrophysics Data System (ADS)
Babalievski, F.
1995-06-01
I present an improved algorithm to solve the random resistor problem using a transfer-matrix technique. Preconditioning by spanning cluster extraction both reduces the size of the matrix and yields faster execution times when compared to previous algorithms.
A comparison of queueing, cluster and distributed computing systems
NASA Technical Reports Server (NTRS)
Kaplan, Joseph A.; Nelson, Michael L.
1993-01-01
Using workstation clusters for distributed computing has become popular with the proliferation of inexpensive, powerful workstations. Workstation clusters offer both a cost effective alternative to batch processing and an easy entry into parallel computing. However, a number of workstations on a network does not constitute a cluster. Cluster management software is necessary to harness the collective computing power. A variety of cluster management and queuing systems are compared: Distributed Queueing Systems (DQS), Condor, Load Leveler, Load Balancer, Load Sharing Facility (LSF - formerly Utopia), Distributed Job Manager (DJM), Computing in Distributed Networked Environments (CODINE), and NQS/Exec. The systems differ in their design philosophy and implementation. Based on published reports on the different systems and conversations with the system's developers and vendors, a comparison of the systems are made on the integral issues of clustered computing.
Moving target tracking through distributed clustering in directional sensor networks.
Enayet, Asma; Razzaque, Md Abdur; Hassan, Mohammad Mehedi; Almogren, Ahmad; Alamri, Atif
2014-01-01
The problem of moving target tracking in directional sensor networks (DSNs) introduces new research challenges, including optimal selection of sensing and communication sectors of the directional sensor nodes, determination of the precise location of the target and an energy-efficient data collection mechanism. Existing solutions allow individual sensor nodes to detect the target's location through collaboration among neighboring nodes, where most of the sensors are activated and communicate with the sink. Therefore, they incur much overhead, loss of energy and reduced target tracking accuracy. In this paper, we have proposed a clustering algorithm, where distributed cluster heads coordinate their member nodes in optimizing the active sensing and communication directions of the nodes, precisely determining the target location by aggregating reported sensing data from multiple nodes and transferring the resultant location information to the sink. Thus, the proposed target tracking mechanism minimizes the sensing redundancy and maximizes the number of sleeping nodes in the network. We have also investigated the dynamic approach of activating sleeping nodes on-demand so that the moving target tracking accuracy can be enhanced while maximizing the network lifetime. We have carried out our extensive simulations in ns-3, and the results show that the proposed mechanism achieves higher performance compared to the state-of-the-art works. PMID:25529205
Comparing simulated and experimental molecular cluster distributions.
Olenius, Tinja; Schobesberger, Siegfried; Kupiainen-Määttä, Oona; Franchin, Alessandro; Junninen, Heikki; Ortega, Ismael K; Kurtén, Theo; Loukonen, Ville; Worsnop, Douglas R; Kulmala, Markku; Vehkamäki, Hanna
2013-01-01
Formation of secondary atmospheric aerosol particles starts with gas phase molecules forming small molecular clusters. High-resolution mass spectrometry enables the detection and chemical characterization of electrically charged clusters from the molecular scale upward, whereas the experimental detection of electrically neutral clusters, especially as a chemical composition measurement, down to 1 nm in diameter and beyond still remains challenging. In this work we simulated a set of both electrically neutral and charged small molecular clusters, consisting of sulfuric acid and ammonia molecules, with a dynamic collision and evaporation model. Collision frequencies between the clusters were calculated according to classical kinetics, and evaporation rates were derived from first principles quantum chemical calculations with no fitting parameters. We found a good agreement between the modeled steady-state concentrations of negative cluster ions and experimental results measured with the state-of-the-art Atmospheric Pressure interface Time-Of-Flight mass spectrometer (APi-TOF) in the CLOUD chamber experiments at CERN. The model can be used to interpret experimental results and give information on neutral clusters that cannot be directly measured. PMID:24600997
MixSim : An R Package for Simulating Data to Study Performance of Clustering Algorithms
Melnykov, Volodymyr; Chen, Wei-Chen; Maitra, Ranjan
2012-01-01
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Efficient algorithms for distributed simulation and related problems
Kumar, D.
1987-01-01
This thesis presents efficient algorithms for distributed simulation, and for the related problems of termination detection and sequential simulation. Distributed simulation algorithms applicable to the simulation of special classes of systems, such that almost no overhead messages are required are presented. By contrast, previous distributed simulation algorithms, although applicable to the general class of any discrete-event system, usually require too many overhead messages. First, a simple distributed simulation algorithm is defined with nearly zero overhead messages for simulating feedforward systems. An approximate method is developed to predict its performance in simulating a class of feedforward-queuing networks. Performance of the scheme is evaluated in simulating specific subclasses of these queuing networks. It is shown that the scheme offers a high performance for serial-parallel networks. Next, another distributed simulation scheme is defined for a class of distributed systems whose topologies may have cycles. One important problem in devising distributed simulation algorithms is that of efficient detection of termination. With this in mind, a class of termination-detection algorithms using markers is devised. Finally, a new sequential simulation algorithm is developed, based on a distributed one. This algorithm often reduces the event-list manipulations of traditional-event list-driven simulation.
NASA Astrophysics Data System (ADS)
Bahrampour, Soheil; Moshiri, Behzad; Salahshoor, Karim
2009-08-01
Most of process fault monitoring systems suffer from offline computations and confronting with novel faults that limit their applicabilities. This paper presents a new online fault detection and isolation (FDI) algorithm based on distributed online clustering approach. In the proposed approach, clustering algorithm is used for online detection of a new trend of time series data which indicates faulty condition. On the other hand, distributed technique is used to decompose the overall monitoring task into a series of local monitoring sub-tasks so as to locally track and capture the process faults. This algorithm not only solves the problem of online FDI, but also can handle novel faults. The diagnostic performances of the proposed FDI approach is evaluated on the Tennessee Eastman process plant as a large-scale benchmark problem.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
ERIC Educational Resources Information Center
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Burst detection in district metering areas using a data driven clustering algorithm.
Wu, Yipeng; Liu, Shuming; Wu, Xue; Liu, Youfei; Guan, Yisheng
2016-09-01
This paper describes a novel methodology for burst detection in a water distribution system. The proposed method has two stages. In the first stage, a clustering algorithm was employed for outlier detection, while the second stage identified the presence of bursts. An important feature of this method is that data analysis is carried out dependent on multiple flow meters whose measurements vary simultaneously in a district metering area (DMA). Moreover, the clustering-based method can automatically cope with non-stationary conditions in historical data; namely, the method has no prior data selection process. An example application of this method has been implemented to confirm that relatively large bursts (simulated by flushing) with short duration can be detected effectively. Noticeably, the method has a low false positive rate compared with previous studies and appearance of detected abnormal water usage consists with weather changes, showing great promise in real application to multi-inlet and multi-outlet DMAs. PMID:27176651
A contour-line color layer separation algorithm based on fuzzy clustering and region growing
NASA Astrophysics Data System (ADS)
Liu, Tiange; Miao, Qiguang; Xu, Pengfei; Tong, Yubing; Song, Jianfeng; Xia, Ge; Yang, Yun; Zhai, Xiaojie
2016-03-01
The color layers of contour-lines separated from scanned topographic map are the basis of contour-line extraction, but it is difficult to separate them well due to the color aliasing and mixed color problems. This paper will focus us on contour-line color layer separation and presents a novel approach for it based on fuzzy clustering and Single-prototype Region Growing for Contour-line Layer (SRGCL). The purpose of this paper is to provide a solution for processing scanned topographic maps on which contour-lines are abundant and densely distributed, for example, in the condition similar to hilly areas and mountainous regions, the contour-lines always occupy the largest proportion in linear features and the contour-line separation is the most difficult task. The proposed approach includes steps as follows. First step, line features are extracted from the map to reduce the interference from area features in fuzzy clustering. Second step, fuzzy clustering algorithm is employed to obtain membership matrix of pixels in the line map. Third step, based on the membership matrix, we obtain the most-similar prototype and the second-similar prototype of each pixel as the indicators of the pixel in SRGCL. The spatial relationship and the fuzzy similarity of color features are used in SRGCL to overcome the inaccurate classification of ambiguous pixels. The procedure focusing on single contour-line layer will improve the accuracy of contour-line segmentation result of SRGCL relative to general segmentation methods. We verified the algorithm on several USGS historical maps, the experimental results show that our algorithm produces contour-line color layers with good continuity and few noises, which verifies the improvement in contour-line color layer separation of our algorithm relative to two general segmentation methods.
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
A distributed decision framework for building clusters with different heterogeneity settings
Jafari-Marandi, Ruholla; Omitaomu, Olufemi A.; Hu, Mengqi
2016-01-05
In the past few decades, extensive research has been conducted to develop operation and control strategy for smart buildings with the purpose of reducing energy consumption. Besides studying on single building, it is envisioned that the next generation buildings can freely connect with one another to share energy and exchange information in the context of smart grid. It was demonstrated that a network of connected buildings (aka building clusters) can significantly reduce primary energy consumption, improve environmental sustainability and building s resilience capability. However, an analytic tool to determine which type of buildings should form a cluster and what ismore » the impact of building clusters heterogeneity based on energy profile to the energy performance of building clusters is missing. To bridge these research gaps, we propose a self-organizing map clustering algorithm to divide multiple buildings to different clusters based on their energy profiles, and a homogeneity index to evaluate the heterogeneity of different building clusters configurations. In addition, a bi-level distributed decision model is developed to study the energy sharing in the building clusters. To demonstrate the effectiveness of the proposed clustering algorithm and decision model, we employ a dataset including monthly energy consumption data for 30 buildings where the data is collected every 15 min. It is demonstrated that the proposed decision model can achieve at least 13% cost savings for building clusters. Furthermore, the results show that the heterogeneity of energy profile is an important factor to select battery and renewable energy source for building clusters, and the shared battery and renewable energy are preferred for more heterogeneous building clusters.« less
An efficient algorithm for estimating noise covariances in distributed systems
NASA Technical Reports Server (NTRS)
Dee, D. P.; Cohn, S. E.; Ghil, M.; Dalcher, A.
1985-01-01
An efficient computational algorithm for estimating the noise covariance matrices of large linear discrete stochatic-dynamic systems is presented. Such systems arise typically by discretizing distributed-parameter systems, and their size renders computational efficiency a major consideration. The proposed adaptive filtering algorithm is based on the ideas of Belanger, and is algebraically equivalent to his algorithm. The earlier algorithm, however, has computational complexity proportional to p to the 6th, where p is the number of observations of the system state, while the new algorithm has complexity proportional to only p-cubed. Further, the formulation of noise covariance estimation as a secondary filter, analogous to state estimation as a primary filter, suggests several generalizations of the earlier algorithm. The performance of the proposed algorithm is demonstrated for a distributed system arising in numerical weather prediction.
A Formal Algorithm for Verifying the Validity of Clustering Results Based on Model Checking
Huang, Shaobin; Cheng, Yuan; Lang, Dapeng; Chi, Ronghua; Liu, Guofeng
2014-01-01
The limitations in general methods to evaluate clustering will remain difficult to overcome if verifying the clustering validity continues to be based on clustering results and evaluation index values. This study focuses on a clustering process to analyze crisp clustering validity. First, we define the properties that must be satisfied by valid clustering processes and model clustering processes based on program graphs and transition systems. We then recast the analysis of clustering validity as the problem of verifying whether the model of clustering processes satisfies the specified properties with model checking. That is, we try to build a bridge between clustering and model checking. Experiments on several datasets indicate the effectiveness and suitability of our algorithms. Compared with traditional evaluation indices, our formal method can not only indicate whether the clustering results are valid but, in the case the results are invalid, can also detect the objects that have led to the invalidity. PMID:24608823
Measles and Rubella: Scale Free Distribution of Local Infection Clusters.
Yoshikura, Hiroshi; Takeuchi, Fumihiko
2016-07-22
This study examined the size distribution of local infection clusters (referred to as clusters hereafter) of measles and rubella from 2008-2013 in Japan. When the logarithm of the cluster sizes were plotted on the x-axis and the logarithm of their frequencies were plotted on the y-axis, the plots fell on a rightward descending straight line. The size distribution was observed to follow a power law. As the size distribution of the clusters could be equated with that of local secondary infections initiated by 1 patient, the size distribution of the clusters, in fact, represented the effective reproduction numbers at the local level. As the power law distribution has no typical sizes, it was suggested that measles or rubella epidemics in Japan had no typical reproduction number. Higher the population size and higher the total number of patients, flatter was the slope of the plots, thus larger was the proportion of larger clusters. An epidemic of measles or rubella in Japan could be represented more appropriately by the cluster size frequency distribution rather than by the reproduction number. PMID:26567836
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
NASA Astrophysics Data System (ADS)
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
Parallelization of the Wolff single-cluster algorithm.
Kaupuzs, J; Rimsāns, J; Melnik, R V N
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024, we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n> or =2. Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available. PMID:20365669
A heart disease recognition embedded system with fuzzy cluster algorithm.
de Carvalho, Helton Hugo; Moreno, Robson Luiz; Pimenta, Tales Cleber; Crepaldi, Paulo C; Cintra, Evaldo
2013-06-01
This article presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram - ECG signal processing by reducing the amount of data samples, without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicated common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database (EDB) as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan(®)-3A FPGA. The field programmable gate array (FPGA) implemented a Xilinx Microblaze(®) Soft-Core Processor running at a 50MHz clock rate. PMID:23394802
Logistics distribution centers location problem and algorithm under fuzzy environment
NASA Astrophysics Data System (ADS)
Yang, Lixing; Ji, Xiaoyu; Gao, Ziyou; Li, Keping
2007-11-01
Distribution centers location problem is concerned with how to select distribution centers from the potential set so that the total relevant cost is minimized. This paper mainly investigates this problem under fuzzy environment. Consequentially, chance-constrained programming model for the problem is designed and some properties of the model are investigated. Tabu search algorithm, genetic algorithm and fuzzy simulation algorithm are integrated to seek the approximate best solution of the model. A numerical example is also given to show the application of the algorithm.
Modeling and convergence analysis of distributed coevolutionary algorithms.
Subbu, Raj; Sanderson, Arthur C
2004-04-01
A theoretical foundation is presented for modeling and convergence analysis of a class of distributed coevolutionary algorithms applied to optimization problems in which the variables are partitioned among p nodes. An evolutionary algorithm at each of the p nodes performs a local evolutionary search based on its own set of primary variables, and the secondary variable set at each node is clamped during this phase. An infrequent intercommunication between the nodes updates the secondary variables at each node. The local search and intercommunication phases alternate, resulting in a cooperative search by the p nodes. First, we specify a theoretical basis for a class of centralized evolutionary algorithms in terms of construction and evolution of sampling distributions over the feasible space. Next, this foundation is extended to develop a model for a class of distributed coevolutionary algorithms. Convergence and convergence rate analyzes are pursued for basic classes of objective functions. Our theoretical investigation reveals that for certain unimodal and multimodal objectives, we can expect these algorithms to converge at a geometrical rate. The distributed coevolutionary algorithms are of most interest from the perspective of their performance advantage compared to centralized algorithms, when they execute in a network environment with significant local access and internode communication delays. The relative performance of these algorithms is therefore evaluated in a distributed environment with realistic parameters of network behavior. PMID:15376831
Clustering performance comparison using K-means and expectation maximization algorithms
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-01-01
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results. PMID:26019610
The distribution of early- and late-type galaxies in the Coma cluster
NASA Technical Reports Server (NTRS)
Doi, M.; Fukugita, M.; Okamura, S.; Turner, E. L.
1995-01-01
The spatial distribution and the morohology-density relation of Coma cluster galaxies are studied using a new homogeneous photmetric sample of 450 galaxies down to B = 16.0 mag with quantitative morphology classification. The sample covers a wide area (10 deg X 10 deg), extending well beyond the Coma cluster. Morphological classifications into early- (E+SO) and late-(S) type galaxies are made by an automated algorithm using simple photometric parameters, with which the misclassification rate is expected to be approximately 10% with respect to early and late types given in the Third Reference Catalogue of Bright Galaxies. The flattened distribution of Coma cluster galaxies, as noted in previous studies, is most conspicuously seen if the early-type galaxies are selected. Early-type galaxies are distributed in a thick filament extended from the NE to the WSW direction that delineates a part of large-scale structure. Spiral galaxies show a distribution with a modest density gradient toward the cluster center; at least bright spiral galaxies are present close to the center of the Coma cluster. We also examine the morphology-density relation for the Coma cluster including its surrounding regions.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect. PMID:25435862
Parallel matrix transpose algorithms on distributed memory concurrent computers
Choi, Jaeyoung; Dongarra, J. |; Walker, D.W.
1994-12-31
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P {times} Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A {center_dot} B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A{sup T} {center_dot} B{sup T}, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
MEASURING THE MASS DISTRIBUTION IN GALAXY CLUSTERS
Geller, Margaret J.; Diaferio, Antonaldo; Rines, Kenneth J.; Serra, Ana Laura E-mail: diaferio@ph.unito.it E-mail: serra@to.infn.it
2013-02-10
Cluster mass profiles are tests of models of structure formation. Only two current observational methods of determining the mass profile, gravitational lensing, and the caustic technique are independent of the assumption of dynamical equilibrium. Both techniques enable the determination of the extended mass profile at radii beyond the virial radius. For 19 clusters, we compare the mass profile based on the caustic technique with weak lensing measurements taken from the literature. This comparison offers a test of systematic issues in both techniques. Around the virial radius, the two methods of mass estimation agree to within {approx}30%, consistent with the expected errors in the individual techniques. At small radii, the caustic technique overestimates the mass as expected from numerical simulations. The ratio between the lensing profile and the caustic mass profile at these radii suggests that the weak lensing profiles are a good representation of the true mass profile. At radii larger than the virial radius, the extrapolated Navarro, Frenk and White fit to the lensing mass profile exceeds the caustic mass profile. Contamination of the lensing profile by unrelated structures within the lensing kernel may be an issue in some cases; we highlight the clusters MS0906+11 and A750, superposed along the line of sight, to illustrate the potential seriousness of contamination of the weak lensing signal by these unrelated structures.
C-element: a new clustering algorithm to find high quality functional modules in PPI networks.
Ghasemi, Mahdieh; Rahgozar, Maseud; Bidkhori, Gholamreza; Masoudi-Nejad, Ali
2013-01-01
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used. PMID:24039752
Optimization of composite structures by estimation of distribution algorithms
NASA Astrophysics Data System (ADS)
Grosset, Laurent
The design of high performance composite laminates, such as those used in aerospace structures, leads to complex combinatorial optimization problems that cannot be addressed by conventional methods. These problems are typically solved by stochastic algorithms, such as evolutionary algorithms. This dissertation proposes a new evolutionary algorithm for composite laminate optimization, named Double-Distribution Optimization Algorithm (DDOA). DDOA belongs to the family of estimation of distributions algorithms (EDA) that build a statistical model of promising regions of the design space based on sets of good points, and use it to guide the search. A generic framework for introducing statistical variable dependencies by making use of the physics of the problem is proposed. The algorithm uses two distributions simultaneously: the marginal distributions of the design variables, complemented by the distribution of auxiliary variables. The combination of the two generates complex distributions at a low computational cost. The dissertation demonstrates the efficiency of DDOA for several laminate optimization problems where the design variables are the fiber angles and the auxiliary variables are the lamination parameters. The results show that its reliability in finding the optima is greater than that of a simple EDA and of a standard genetic algorithm, and that its advantage increases with the problem dimension. A continuous version of the algorithm is presented and applied to a constrained quadratic problem. Finally, a modification of the algorithm incorporating probabilistic and directional search mechanisms is proposed. The algorithm exhibits a faster convergence to the optimum and opens the way for a unified framework for stochastic and directional optimization.
A Special Local Clustering Algorithm for Identifying the Genes Associated With Alzheimer’s Disease
Pang, Chao-Yang; Hu, Wei; Hu, Ben-Qiong; Shi, Ying; Vanderburg, Charles R.; Rogers, Jack T.
2010-01-01
Clustering is the grouping of similar objects into a class. Local clustering feature refers to the phenomenon whereby one group of data is separated from another, and the data from these different groups are clustered locally. A compact class is defined as one cluster in which all similar elements cluster tightly within the cluster. Herein, the essence of the local clustering feature, revealed by mathematical manipulation, results in a novel clustering algorithm termed as the special local clustering (SLC) algorithm that was used to process gene microarray data related to Alzheimer’s disease (AD). SLC algorithm was able to group together genes with similar expression patterns and identify significantly varied gene expression values as isolated points. If a gene belongs to a compact class in control data and appears as an isolated point in incipient, moderate and/or severe AD gene microarray data, this gene is possibly associated with AD. Application of a clustering algorithm in disease-associated gene identification such as in AD is rarely reported. PMID:20089478
On the distribution of galaxy ellipticity in clusters
NASA Astrophysics Data System (ADS)
D'Eugenio, F.; Houghton, R. C. W.; Davies, R. L.; Dalla Bontà, E.
2015-07-01
We study the distribution of projected ellipticity n(ɛ) for galaxies in a sample of 20 rich (Richness ≥ 2) nearby (z < 0.1) clusters of galaxies. We find no evidence of differences in n(ɛ), although the nearest cluster in the sample (the Coma Cluster) is the largest outlier (P(same) < 0.05). We then study n(ɛ) within the clusters, and find that ɛ increases with projected cluster-centric radius R (hereafter the ɛ-R relation). This trend is preserved at fixed magnitude, showing that this relation exists over and above the trend of more luminous galaxies to be both rounder and more common in the centres of clusters. The ɛ-R relation is particularly strong in the subsample of intrinsically flattened galaxies (ɛ > 0.4), therefore it is not a consequence of the increasing fraction of round slow rotator galaxies near cluster centers. Furthermore, the ɛ-R relation persists for just smooth flattened galaxies and for galaxies with de Vaucouleurs-like light profiles, suggesting that the variation of the spiral fraction with radius is not the underlying cause of the trend. We interpret our findings in light of the classification of early type galaxies (ETGs) as fast and slow rotators. We conclude that the observed trend of decreasing ɛ towards the centres of clusters is evidence for physical effects in clusters causing fast rotator ETGs to have a lower average intrinsic ellipticity near the centres of rich clusters.
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM. PMID:23777526
Ab initio study on (CO2)n clusters via electrostatics- and molecular tailoring-based algorithm
NASA Astrophysics Data System (ADS)
Jovan Jose, K. V.; Gadre, Shridhar R.
An algorithm based on molecular electrostatic potential (MESP) and molecular tailoring approach (MTA) for building energetically favorable molecular clusters is presented. This algorithm is tested on prototype (CO2)n clusters with n = 13, 20, and 25 to explore their structure, energetics, and properties. The most stable clusters in this series are seen to show more number of triangular motifs. Many-body energy decomposition analysis performed on the most stable clusters reveals that the 2-body is the major contributor (>96%) to the total interaction energy. Vibrational frequencies and molecular electrostatic potentials are also evaluated for these large clusters through MTA. The MTA-based MESPs of these clusters show a remarkably good agreement with the corresponding actual ones. The most intense MTA-based normal mode frequencies are in fair agreement with the actual ones for smaller clusters. These calculated asymmetric stretching frequencies are blue-shifted with reference to the CO2 monomer.
NASA Astrophysics Data System (ADS)
Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.
2014-04-01
A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm
de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Chinese Text Clustering Algorithm Based k-means
NASA Astrophysics Data System (ADS)
Yao, Mingyu; Pi, Dechang; Cong, Xiangxiang
Text clustering is an important means and method in text mining. The process of Chinese text clustering based on k-means was emphasized, we found that new center of a cluster was easily effected by isolated text after some experiments. Average similarity of one cluster was used as a parameter, and multiplied it with a modulus between 0.75 and 1.25 to get the similarity threshold value, the texts whose similarity with original cluster center was greater than or equal to the threshold value ware collected as a candidate collection, then updated the cluster center with center of candidate collection. The experiments show that improved method averagely increased purity and F value about 10 percent over the original method.
Distributed concurrency control performance: A study of algorithms, distribution, and replication
Carey, M.J.; Livny, M.
1988-01-01
Many concurrency control algorithms have been proposed for use in distributed database systems. Despite the large number of available algorithms, and the fact that distributed database systems are becoming a commercial reality, distributed concurrency control performance tradeoffs are still not well understood. In this paper the authors attempt to shed light on some of the important issues by studying the performance of four representative algorithms - distributed 2PL, wound-wait, basic timestamp ordering, and a distributed optimistic algorithm - using a detailed simulation model of a distributed DBMS. The authors examine the performance of these algorithms for various levels of contention, ''distributedness'' of the workload, and data replication. The results should prove useful to designers of future distributed database systems.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
Gardiner, Eleanor J; Gillet, Valerie J; Willett, Peter; Cosgrove, David A
2007-01-01
Chemical databases are routinely clustered, with the aim of grouping molecules which share similar structural features. Ideally, medicinal chemists are then able to browse a few representatives of the cluster in order to interpret the shared activity of the cluster members. However, when molecules are clustered using fingerprints, it may be difficult to decipher the structural commonalities which are present. Here, we seek to represent a cluster by means of a maximum common substructure based on the shared functionality of the cluster members. Previously, we have used reduced graphs, where each node corresponds to a generalized functional group, as topological molecular descriptors for virtual screening. In this work, we precluster a database using any clustering method. We then represent the molecules in a cluster as reduced graphs. By repeated application of a maximum common edge substructure (MCES) algorithm, we obtain one or more reduced graph cluster representatives. The sparsity of the reduced graphs means that the MCES calculations can be performed in real time. The reduced graph cluster representatives are readily interpretable in terms of functional activity and can be mapped directly back to the molecules to which they correspond, giving the chemist a rapid means of assessing potential activities contained within the cluster. Clusters of interest are then subject to a detailed R-group analysis using the same iterated MCES algorithm applied to the molecular graphs. PMID:17309248
NASA Astrophysics Data System (ADS)
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.
Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A
2012-02-01
Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method. PMID:20703716
An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Zhang, Jian; Shen, Ling
2014-01-01
To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. PMID:25477953
A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Srivastava, Ashok N.
2009-01-01
This paper offers a local distributed algorithm for expectation maximization in large peer-to-peer environments. The algorithm can be used for a variety of well-known data mining tasks in a distributed environment such as clustering, anomaly detection, target tracking to name a few. This technology is crucial for many emerging peer-to-peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Centralizing all or some of the data for building global models is impractical in such peer-to-peer environments because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, and dynamic nature of the data/network. The distributed algorithm we have developed in this paper is provably-correct i.e. it converges to the same result compared to a similar centralized algorithm and can automatically adapt to changes to the data and the network. We show that the communication overhead of the algorithm is very low due to its local nature. This monitoring algorithm is then used as a feedback loop to sample data from the network and rebuild the model when it is outdated. We present thorough experimental results to verify our theoretical claims.
Parallel matrix transpose algorithms on distributed memory concurrent computers
Choi, J.; Walker, D.W.; Dongarra, J.J. |
1993-10-01
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. It is assumed that the matrix is distributed over a P x Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A{center_dot}B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A{sup T}{center_dot}B{sup T}, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
The distribution of ejected brown dwarfs in clusters
NASA Astrophysics Data System (ADS)
Goodwin, S. P.; Hubber, D. A.; Moraux, E.; Whitworth, A. P.
2005-12-01
We examine the spatial distribution of brown dwarfs produced by the decay of small-N stellar systems as expected from the embryo ejection scenario. We model a cluster of several hundred stars grouped into 'cores' of a few stars/brown dwarfs. These cores decay, preferentially ejecting their lowest-mass members. Brown dwarfs are found to have a wider spatial distribution than stars, however once the effects of limited survey areas and unresolved binaries are taken into account it can be difficult to distinguish between clusters with many or no ejections. A large difference between the distributions probably indicates that ejections have occurred, however similar distributions sometimes arise even with ejections. Thus the spatial distribution of brown dwarfs is not necessarily a good discriminator between ejection and non-ejection scenarios.
A review of estimation of distribution algorithms in bioinformatics
Armañanzas, Rubén; Inza, Iñaki; Santana, Roberto; Saeys, Yvan; Flores, Jose Luis; Lozano, Jose Antonio; Peer, Yves Van de; Blanco, Rosa; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro
2008-01-01
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain. PMID:18822112
NASA Astrophysics Data System (ADS)
Bo, Yizhou; Shifa, Naima
2013-09-01
An estimator for finding the abundance of a rare, clustered and mobile population has been introduced. This model is based on adaptive cluster sampling (ACS) to identify the location of the population and negative binomial distribution to estimate the total in each site. To identify the location of the population we consider both sampling with replacement (WR) and sampling without replacement (WOR). Some mathematical properties of the model are also developed.
Feature Subset Selection by Estimation of Distribution Algorithms
Cantu-Paz, E
2002-01-17
This paper describes the application of four evolutionary algorithms to the identification of feature subsets for classification problems. Besides a simple GA, the paper considers three estimation of distribution algorithms (EDAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the EDAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. In contrast with previous studies, we did not find evidence to support or reject the use of EDAs for this problem.
Impacts of Time Delays on Distributed Algorithms for Economic Dispatch
Yang, Tao; Wu, Di; Sun, Yannan; Lian, Jianming
2015-07-26
Economic dispatch problem (EDP) is an important problem in power systems. It can be formulated as an optimization problem with the objective to minimize the total generation cost subject to the power balance constraint and generator capacity limits. Recently, several consensus-based algorithms have been proposed to solve EDP in a distributed manner. However, impacts of communication time delays on these distributed algorithms are not fully understood, especially for the case where the communication network is directed, i.e., the information exchange is unidirectional. This paper investigates communication time delay effects on a distributed algorithm for directed communication networks. The algorithm has been tested by applying time delays to different types of information exchange. Several case studies are carried out to evaluate the effectiveness and performance of the algorithm in the presence of time delays in communication networks. It is found that time delay effects have negative effects on the convergence rate, and can even result in an incorrect converge value or fail the algorithm to converge.
Distributed Query Plan Generation Using Multiobjective Genetic Algorithm
Panicker, Shina; Vijay Kumar, T. V.
2014-01-01
A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability. PMID:24963513
Fuzzy-Kohonen-clustering neural network trained by genetic algorithm and fuzzy competition learning
NASA Astrophysics Data System (ADS)
Xie, Weixing; Li, Wenhua; Gao, Xinbo
1995-08-01
Kohonen networks are well known for clustering analysis. Classical Kohonen networks for hard c-means clustering (trained by winner-take-all learning) have some severe drawbacks. Fuzzy Kohonen networks (FKCNN) for fuzzy c-means clustering are trained by fuzzy competition learning, and can get better clustering results than the classical Kohonen networks. However, both winner-take-all and fuzzy competition learning algorithms are in essence local search techniques that search for the optimum by using a hill-climbing technique. Thus, they often fail in the search for the global optimum. In this paper we combine genetic algorithms (GAs) with fuzzy competition learning to train the FKCNN. Our experimental results show that the proposed GA/FC learning algorithm has much higher probabilities of finding the global optimal solutions than either the winner-take-all or the fuzzy competition learning.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Vimalarani, C.; Subramanian, R.; Sivanandam, S. N.
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption. PMID:26881273
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
Mesh Algorithms for PDE with Sieve I: Mesh Distribution
Knepley, Matthew G.; Karpeev, Dmitry A.
2009-01-01
We have developed a new programming framework, called Sieve, to support parallel numerical partial differential equation(s) (PDE) algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or arrows, the conceptual first-class objects represented in the containers. Further, generic algorithms acting on this arrow container are systematically used to provide natural geometric operations on the topology and also, through duality, on the data. Finally, coverings and duality are used to encode notmore » only individual meshes, but all types of hierarchies underlying PDE data structures, including multigrid and mesh partitions. In order to demonstrate the usefulness of the framework, we show how the mesh partition data can be represented and manipulated using the same fundamental mechanisms used to represent meshes. We present the complete description of an algorithm to encode a mesh partition and then distribute a mesh, which is independent of the mesh dimension, element shape, or embedding. Moreover, data associated with the mesh can be similarly distributed with exactly the same algorithm. The use of a high level of abstraction within the Sieve leads to several benefits in terms of code reuse, simplicity, and extensibility. We discuss these benefits and compare our approach to other existing mesh libraries.« less
An approximation polynomial-time algorithm for a sequence bi-clustering problem
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression
2014-01-01
Background Cancer subtype information is critically important for understanding tumor heterogeneity. Existing methods to identify cancer subtypes have primarily focused on utilizing generic clustering algorithms (such as hierarchical clustering) to identify subtypes based on gene expression data. The network-level interaction among genes, which is key to understanding the molecular perturbations in cancer, has been rarely considered during the clustering process. The motivation of our work is to develop a method that effectively incorporates molecular interaction networks into the clustering process to improve cancer subtype identification. Results We have developed a new clustering algorithm for cancer subtype identification, called “network-assisted co-clustering for the identification of cancer subtypes” (NCIS). NCIS combines gene network information to simultaneously group samples and genes into biologically meaningful clusters. Prior to clustering, we assign weights to genes based on their impact in the network. Then a new weighted co-clustering algorithm based on a semi-nonnegative matrix tri-factorization is applied. We evaluated the effectiveness of NCIS on simulated datasets as well as large-scale Breast Cancer and Glioblastoma Multiforme patient samples from The Cancer Genome Atlas (TCGA) project. NCIS was shown to better separate the patient samples into clinically distinct subtypes and achieve higher accuracy on the simulated datasets to tolerate noise, as compared to consensus hierarchical clustering. Conclusions The weighted co-clustering approach in NCIS provides a unique solution to incorporate gene network information into the clustering process. Our tool will be useful to comprehensively identify cancer subtypes that would otherwise be obscured by cancer heterogeneity, using high-throughput and high-dimensional gene expression data. PMID:24491042
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
Efficient cluster Monte Carlo algorithm for Ising spin glasses in more than two space dimensions
NASA Astrophysics Data System (ADS)
Ochoa, Andrew J.; Zhu, Zheng; Katzgraber, Helmut G.
2015-03-01
A cluster algorithm that speeds up slow dynamics in simulations of nonplanar Ising spin glasses away from criticality is urgently needed. In theory, the cluster algorithm proposed by Houdayer poses no advantage over local moves in systems with a percolation threshold below 50%, such as cubic lattices. However, we show that the frustration present in Ising spin glasses prevents the growth of system-spanning clusters at temperatures roughly below the characteristic energy scale J of the problem. Adding Houdayer cluster moves to simulations of Ising spin glasses for T ~ J produces a speedup that grows with the system size over conventional local moves. We show results for the nonplanar quasi-two-dimensional Chimera graph of the D-Wave Two quantum annealer, as well as conventional three-dimensional Ising spin glasses, where in both cases the addition of cluster moves speeds up thermalization visibly in the physically-interesting low temperature regime.
A Novel Coverage-Preserving Clustering Algorithm for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Di, Xin
Sensing coverage is one of the crucial characteristics for wireless sensor networks. It has to be considered in the design of routing protocols. LEACH (Low Energy Adaptive Cluster Hierarchy) is a significant and representative routing protocol which organizes the sensing nodes by clustering. For LEACH, residual energy should be considered in order to overcome the inequality of energy dissipation rate. Considering the impact on these two factors of a network, we have proposed a coverage-preserving energy-based clustering algorithm (CEC), which is an improved LEACH. Through improving the threshold for cluster-head selection, CEC achieved more effective results than the other baseline protocols.
Illinois Occupational Skill Standards: Finishing and Distribution Cluster.
ERIC Educational Resources Information Center
Illinois Occupational Skill Standards and Credentialing Council, Carbondale.
This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the finishing and distribution cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards…
NASA Astrophysics Data System (ADS)
Mazure, A.; Katgert, P.; den Hartog, R.; Biviano, A.; Dubath, P.; Escalera, E.; Focardi, P.; Gerbal, D.; Giuricin, G.; Jones, B.; Le Fevre, O.; Moles, M.; Perea, J.; Rhee, G.
1996-06-01
The ESO Nearby Abell Cluster Survey (the ENACS) has yielded 5634 redshifts for galaxies in the directions of 107 rich, Southern clusters selected from the ACO catalogue (Abell et al. 1989). By combining these data with another 1000 redshifts from the literature, of galaxies in 37 clusters, we construct a volume-limited sample of 128 R_ACO_>=1 clusters in a solid angle of 2.55sr centered on the South Galactic Pole, out to a redshift z=0.1. For a subset of 80 of these clusters we can calculate a reliable velocity dispersion, based on at least 10 (but very often between 30 and 150) redshifts. We deal with the main observational problem that hampers an unambiguous interpretation of the distribution of cluster velocity dispersions, namely the contamination by fore- and background galaxies. We also discuss in detail the completeness of the cluster samples for which we derive the distribution of cluster velocity dispersions. We find that a cluster sample which is complete in terms of the field-corrected richness count given in the ACO catalogue gives a result that is essentially identical to that based on a smaller and more conservative sample which is complete in terms of an intrinsic richness count that has been corrected for superposition effects. We find that the large apparent spread in the relation between velocity dispersion and richness count (based either on visual inspection or on machine counts) must be largely intrinsic; i.e. this spread is not primarily due to measurement uncertainties. One of the consequences of the (very) broad relation between cluster richness and velocity dispersion is that all samples of clusters that are defined complete with respect to richness count are unavoidably biased against low-σ_V_ clusters. For the richness limit of our sample this bias operates only for velocity dispersions less than =~800km/sec. We obtain a statistically reliable distribution of global velocity dispersions which, for velocity dispersions σ_V_>800km/s, is
Intelligent decision support algorithm for distribution system restoration.
Singh, Reetu; Mehfuz, Shabana; Kumar, Parmod
2016-01-01
Distribution system is the means of revenue for electric utility. It needs to be restored at the earliest if any feeder or complete system is tripped out due to fault or any other cause. Further, uncertainty of the loads, result in variations in the distribution network's parameters. Thus, an intelligent algorithm incorporating hybrid fuzzy-grey relation, which can take into account the uncertainties and compare the sequences is discussed to analyse and restore the distribution system. The simulation studies are carried out to show the utility of the method by ranking the restoration plans for a typical distribution system. This algorithm also meets the smart grid requirements in terms of an automated restoration plan for the partial/full blackout of network. PMID:27512634
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Malek, H.
1978-01-01
A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Distributed genetic algorithms for the floorplan design problem
NASA Technical Reports Server (NTRS)
Cohoon, James P.; Hegde, Shailesh U.; Martin, Worthy N.; Richards, Dana S.
1991-01-01
Designing a VLSI floorplan calls for arranging a given set of modules in the plane to minimize the weighted sum of area and wire-length measures. A method of solving the floorplan design problem using distributed genetic algorithms is presented. Distributed genetic algorithms, based on the paleontological theory of punctuated equilibria, offer a conceptual modification to the traditional genetic algorithms. Experimental results on several problem instances demonstrate the efficacy of this method and indicate the advantages of this method over other methods, such as simulated annealing. The method has performed better than the simulated annealing approach, both in terms of the average cost of the solutions found and the best-found solution, in almost all the problem instances tried.
Comparing Different Fault Identification Algorithms in Distributed Power System
NASA Astrophysics Data System (ADS)
Alkaabi, Salim
A power system is a huge complex system that delivers the electrical power from the generation units to the consumers. As the demand for electrical power increases, distributed power generation was introduced to the power system. Faults may occur in the power system at any time in different locations. These faults cause a huge damage to the system as they might lead to full failure of the power system. Using distributed generation in the power system made it even harder to identify the location of the faults in the system. The main objective of this work is to test the different fault location identification algorithms while tested on a power system with the different amount of power injected using distributed generators. As faults may lead the system to full failure, this is an important area for research. In this thesis different fault location identification algorithms have been tested and compared while the different amount of power is injected from distributed generators. The algorithms were tested on IEEE 34 node test feeder using MATLAB and the results were compared to find when these algorithms might fail and the reliability of these methods.
Krivitsky, Pavel N.; Handcock, Mark S.; Raftery, Adrian E.; Hoff, Peter D.
2009-01-01
Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actor degrees. We propose a latent cluster random effects model to represent all of these features, and we describe a Bayesian estimation method for it. The model is applicable to both binary and non-binary network data. We illustrate the model using two real datasets. We also apply it to two simulated network datasets with the same, highly skewed, degree distribution, but very different network behavior: one unstructured and the other with transitivity and clustering. Models based on degree distributions, such as scale-free, preferential attachment and power-law models, cannot distinguish between these very different situations, but our model does. PMID:20191087
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
An Effective Intrusion Detection Algorithm Based on Improved Semi-supervised Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Li, Xueyong; Zhang, Baojian; Sun, Jiaxia; Yan, Shitao
An algorithm for intrusion detection based on improved evolutionary semi- supervised fuzzy clustering is proposed which is suited for situation that gaining labeled data is more difficulty than unlabeled data in intrusion detection systems. The algorithm requires a small number of labeled data only and a large number of unlabeled data and class labels information provided by labeled data is used to guide the evolution process of each fuzzy partition on unlabeled data, which plays the role of chromosome. This algorithm can deal with fuzzy label, uneasily plunges locally optima and is suited to implement on parallel architecture. Experiments show that the algorithm can improve classification accuracy and has high detection efficiency.
Knitting distributed cluster-state ladders with spin chains
Ronke, R.; D'Amico, I.; Spiller, T. P.
2011-09-15
Recently there has been much study on the application of spin chains to quantum state transfer and communication. Here we discuss the utilization of spin chains (set up for perfect quantum state transfer) for the knitting of distributed cluster-state structures, between spin qubits repeatedly injected and extracted at the ends of the chain. The cluster states emerge from the natural evolution of the system across different excitation number sectors. We discuss the decohering effects of errors in the injection and extraction process as well as the effects of fabrication and random errors.
Distributed autonomous systems: resource management, planning, and control algorithms
NASA Astrophysics Data System (ADS)
Smith, James F., III; Nguyen, ThanhVu H.
2005-05-01
Distributed autonomous systems, i.e., systems that have separated distributed components, each of which, exhibit some degree of autonomy are increasingly providing solutions to naval and other DoD problems. Recently developed control, planning and resource allocation algorithms for two types of distributed autonomous systems will be discussed. The first distributed autonomous system (DAS) to be discussed consists of a collection of unmanned aerial vehicles (UAVs) that are under fuzzy logic control. The UAVs fly and conduct meteorological sampling in a coordinated fashion determined by their fuzzy logic controllers to determine the atmospheric index of refraction. Once in flight no human intervention is required. A fuzzy planning algorithm determines the optimal trajectory, sampling rate and pattern for the UAVs and an interferometer platform while taking into account risk, reliability, priority for sampling in certain regions, fuel limitations, mission cost, and related uncertainties. The real-time fuzzy control algorithm running on each UAV will give the UAV limited autonomy allowing it to change course immediately without consulting with any commander, request other UAVs to help it, alter its sampling pattern and rate when observing interesting phenomena, or to terminate the mission and return to base. The algorithms developed will be compared to a resource manager (RM) developed for another DAS problem related to electronic attack (EA). This RM is based on fuzzy logic and optimized by evolutionary algorithms. It allows a group of dissimilar platforms to use EA resources distributed throughout the group. For both DAS types significant theoretical and simulation results will be presented.
A distributed Canny edge detector: algorithm and FPGA implementation.
Xu, Qian; Varadarajan, Srenivas; Chakrabarti, Chaitali; Karam, Lina J
2014-07-01
The Canny edge detector is one of the most widely used edge detection algorithms due to its superior performance. Unfortunately, not only is it computationally more intensive as compared with other edge detection algorithms, but it also has a higher latency because it is based on frame-level statistics. In this paper, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance compared with the original frame-level Canny algorithm. Directly applying the original Canny algorithm at the block-level leads to excessive edges in smooth regions and to loss of significant edges in high-detailed regions since the original Canny computes the high and low thresholds based on the frame-level statistics. To solve this problem, we present a distributed Canny edge detection algorithm that adaptively computes the edge detection thresholds based on the block type and the local distribution of the gradients in the image block. In addition, the new algorithm uses a nonuniform gradient magnitude histogram to compute block-based hysteresis thresholds. The resulting block-based algorithm has a significantly reduced latency and can be easily integrated with other block-based image codecs. It is capable of supporting fast edge detection of images and videos with high resolutions, including full-HD since the latency is now a function of the block size instead of the frame size. In addition, quantitative conformance evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images. Finally, this algorithm is implemented using a 32 computing engine architecture and is synthesized on the Xilinx Virtex-5 FPGA. The synthesized architecture takes only 0.721 ms (including the SRAM READ/WRITE time and the computation time) to detect edges of 512 × 512 images in the USC SIPI database when clocked at 100
NASA Astrophysics Data System (ADS)
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
An efficient clustering algorithm for partitioning Y-short tandem repeats data
2012-01-01
Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. PMID:23039132
A new distributed systems scheduling algorithm: a swarm intelligence approach
NASA Astrophysics Data System (ADS)
Haghi Kashani, Mostafa; Sarvizadeh, Raheleh; Jameii, Mahdi
2011-12-01
The scheduling problem in distributed systems is known as an NP-complete problem, and methods based on heuristic or metaheuristic search have been proposed to obtain optimal and suboptimal solutions. The task scheduling is a key factor for distributed systems to gain better performance. In this paper, an efficient method based on memetic algorithm is developed to solve the problem of distributed systems scheduling. With regard to load balancing efficiently, Artificial Bee Colony (ABC) has been applied as local search in the proposed memetic algorithm. The proposed method has been compared to existing memetic-Based approach in which Learning Automata method has been used as local search. The results demonstrated that the proposed method outperform the above mentioned method in terms of communication cost.
The distribution of dark matter in the A2256 cluster
NASA Technical Reports Server (NTRS)
Henry, J. Patrick; Briel, Ulrich G.; Nulsen, Paul E. J.
1993-01-01
Using spatially resolved X-ray spectroscopy, it was determined that the X-ray emitting gas in the rich cluster A2256 is nearly isothermal to a radius of at least 0.76/h Mpc, or about three core radii. These data can be used to measure the distribution of the dark matter in the cluster. It was found that the total mass interior to 0.76/h Mpc and 1.5/h Mpc is (0.5 +/- 0.1 and 1.0 +/- 0.5) x 10(exp 15)/h of the solar mass respectively where the errors encompass the full range allowed by all models used. Thus, the mass appropriate to the region where spectral information was obtained is well determined, but the uncertainties become large upon extrapolating beyond that region. It is shown that the galaxy orbits are midly anisotropic which may cause the beta discrepancy in this cluster.
Clustering-based robust three-dimensional phase unwrapping algorithm.
Arevalillo-Herráez, Miguel; Burton, David R; Lalor, Michael J
2010-04-01
Relatively recent techniques that produce phase volumes have motivated the study of three-dimensional (3D) unwrapping algorithms that inherently incorporate the third dimension into the process. We propose a novel 3D unwrapping algorithm that can be considered to be a generalization of the minimum spanning tree (MST) approach. The technique combines characteristics of some of the most robust existing methods: it uses a quality map to guide the unwrapping process, a region growing mechanism to progressively unwrap the signal, and also cut surfaces to avoid error propagation. The approach has been evaluated in the context of noncontact measurement of dynamic objects, suggesting a better performance than MST-based approaches. PMID:20357860
The Gas Distribution in the Outer Regions of Galaxy Clusters
NASA Technical Reports Server (NTRS)
Eckert, D.; Vazza, F.; Ettori, S.; Molendi, S.; Nagai, D.; Lau, E. T.; Roncarelli, M.; Rossetti, M.; Snowden, L.; Gastaldello, F.
2012-01-01
Aims. We present our analysis of a local (z = 0.04 - 0.2) sample of 31 galaxy clusters with the aim of measuring the density of the X-ray emitting gas in cluster outskirts. We compare our results with numerical simulations to set constraints on the azimuthal symmetry and gas clumping in the outer regions of galaxy clusters. Methods. We have exploited the large field-of-view and low instrumental background of ROSAT/PSPC to trace the density of the intracluster gas out to the virial radius, We stacked the density profiles to detect a signal beyond T200 and measured the typical density and scatter in cluster outskirts. We also computed the azimuthal scatter of the profiles with respect to the mean value to look for deviations from spherical symmetry. Finally, we compared our average density and scatter profiles with the results of numerical simulations. Results. As opposed to some recent Suzaku results, and confirming previous evidence from ROSAT and Chandra, we observe a steepening of the density profiles beyond approximately r(sub 500). Comparing our density profiles with simulations, we find that non-radiative runs predict density profiles that are too steep, whereas runs including additional physics and/ or treating gas clumping agree better with the observed gas distribution. We report high-confidence detection of a systematic difference between cool-core and non cool-core clusters beyond approximately 0.3r(sub 200), which we explain by a different distribution of the gas in the two classes. Beyond approximately r(sub 500), galaxy clusters deviate significantly from spherical symmetry, with only small differences between relaxed and disturbed systems. We find good agreement between the observed and predicted scatter profiles, but only when the 1% densest clumps are filtered out in the ENZO simulations. Conclusions. Comparing our results with numerical simulations, we find that non-radiative simulations fail to reproduce the gas distribution, even well outside
The Gas Distribution in Galaxy Cluster Outer Regions
NASA Technical Reports Server (NTRS)
Eckert, D.; Vazza, F.; Ettori, S.; Molendi, S.; Nagai, D.; Laue, E. T.; Roncarelli, M.; Rossetti, M.; Snowden, S. L.; Gastaldello, F.
2012-01-01
Aims. We present the analysis of a local (z = 0.04 - 0.2) sample of 31 galaxy clusters with the aim of measuring the density of the X-ray emitting gas in cluster outskirts. We compare our results with numerical simulations to set constraints on the azimuthal symmetry and gas clumping in the outer regions of galaxy clusters. Methods. We exploit the large field-of-view and low instrumental background of ROSAT/PSPC to trace the density of the intracluster gas out to the virial radius. We perform a stacking of the density profiles to detect a signal beyond r200 and measure the typical density and scatter in cluster outskirts. We also compute the azimuthal scatter of the profiles with respect to the mean value to look for deviations from spherical symmetry. Finally, we compare our average density and scatter profiles with the results of numerical simulations. Results. As opposed to some recent Suzaku results, and confirming previous evidence from ROSAT and Chandra, we observe a steepening of the density profiles beyond approximately r(sub 500). Comparing our density profiles with simulations, we find that non-radiative runs predict too steep density profiles, whereas runs including additional physics and/or treating gas clumping are in better agreement with the observed gas distribution. We report for the first time the high-confidence detection of a systematic difference between cool-core and non-cool core clusters beyond 0.3r(sub 200), which we explain by a different distribution of the gas in the two classes. Beyond r(sub 500), galaxy clusters deviate significantly from spherical symmetry, with only little differences between relaxed and disturbed systems. We find good agreement between the observed and predicted scatter profiles, but only when the 1% densest clumps are filtered out in the simulations. Conclusions. Comparing our results with numerical simulations, we find that non-radiative simulations fail to reproduce the gas distribution, even well outside cluster
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
The Development of FPGA-Based Pseudo-Iterative Clustering Algorithms
NASA Astrophysics Data System (ADS)
Drueke, Elizabeth; Fisher, Wade; Plucinski, Pawel
2016-03-01
The Large Hadron Collider (LHC) in Geneva, Switzerland, is set to undergo major upgrades in 2025 in the form of the High-Luminosity Large Hadron Collider (HL-LHC). In particular, several hardware upgrades are proposed to the ATLAS detector, one of the two general purpose detectors. These hardware upgrades include, but are not limited to, a new hardware-level clustering algorithm, to be performed by a field programmable gate array, or FPGA. In this study, we develop that clustering algorithm and compare the output to a Python-implemented topoclustering algorithm developed at the University of Oregon. Here, we present the agreement between the FPGA output and expected output, with particular attention to the time required by the FPGA to complete the algorithm and other limitations set by the FPGA itself.
Node Non-Uniform Deployment Based on Clustering Algorithm for Underwater Sensor Networks.
Jiang, Peng; Liu, Jun; Wu, Feng
2015-01-01
A node non-uniform deployment based on clustering algorithm for underwater sensor networks (UWSNs) is proposed in this study. This algorithm is proposed because optimizing network connectivity rate and network lifetime is difficult for the existing node non-uniform deployment algorithms under the premise of improving the network coverage rate for UWSNs. A high network connectivity rate is achieved by determining the heterogeneous communication ranges of nodes during node clustering. Moreover, the concept of aggregate contribution degree is defined, and the nodes with lower aggregate contribution degrees are used to substitute the dying nodes to decrease the total movement distance of nodes and prolong the network lifetime. Simulation results show that the proposed algorithm can achieve a better network coverage rate and network connectivity rate, as well as decrease the total movement distance of nodes and prolong the network lifetime. PMID:26633408
An Efficient Method of Key-Frame Extraction Based on a Cluster Algorithm
Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng
2013-01-01
This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data. PMID:24511336
Tiganj, Z; Mboup, M
2011-12-01
In this paper, we propose a simple and straightforward algorithm for neural spike sorting. The algorithm is based on the observation that the distribution of a neural signal largely deviates from the uniform distribution and is rather unimodal. The detected spikes to be sorted are first processed with some feature extraction technique, such as PCA, and then represented in a space with reduced dimension by keeping only a few most important features. The resulting space is next filtered in order to emphasis the differences between the centers and the borders of the clusters. Using some prior knowledge on the lowest level activity of a neuron, such as e.g. the minimal firing rate, we find the number of clusters and the center of each cluster. The spikes are then sorted using a simple greedy algorithm which grabs the nearest neighbors. We have tested the proposed algorithm on real extracellular recordings and used the simultaneous intracellular recordings to verify the results of the sorting. The results suggest that the algorithm is robust and reliable and it compares favorably with the state-of-the-art approaches. The proposed algorithm tends to be conservative, it is simple to implement and is thus suitable for both research and clinical applications as an interesting alternative to the more sophisticated approaches. PMID:22064910
Quantum cluster algorithm for frustrated Ising models in a transverse field
NASA Astrophysics Data System (ADS)
Biswas, Sounak; Rakala, Geet; Damle, Kedar
2016-06-01
Working within the stochastic series expansion framework, we introduce and characterize a plaquette-based quantum cluster algorithm for quantum Monte Carlo simulations of transverse field Ising models with frustrated Ising exchange interactions. As a demonstration of the capabilities of this algorithm, we show that a relatively small ferromagnetic next-nearest-neighbor coupling drives the transverse field Ising antiferromagnet on the triangular lattice from an antiferromagnetic three-sublattice ordered state at low temperature to a ferrimagnetic three-sublattice ordered state.
Du, Tingsong; Hu, Yang; Ke, Xianting
2015-01-01
An improved quantum artificial fish swarm algorithm (IQAFSA) for solving distributed network programming considering distributed generation is proposed in this work. The IQAFSA based on quantum computing which has exponential acceleration for heuristic algorithm uses quantum bits to code artificial fish and quantum revolving gate, preying behavior, and following behavior and variation of quantum artificial fish to update the artificial fish for searching for optimal value. Then, we apply the proposed new algorithm, the quantum artificial fish swarm algorithm (QAFSA), the basic artificial fish swarm algorithm (BAFSA), and the global edition artificial fish swarm algorithm (GAFSA) to the simulation experiments for some typical test functions, respectively. The simulation results demonstrate that the proposed algorithm can escape from the local extremum effectively and has higher convergence speed and better accuracy. Finally, applying IQAFSA to distributed network problems and the simulation results for 33-bus radial distribution network system show that IQAFSA can get the minimum power loss after comparing with BAFSA, GAFSA, and QAFSA. PMID:26447713
Du, Tingsong; Hu, Yang; Ke, Xianting
2015-01-01
An improved quantum artificial fish swarm algorithm (IQAFSA) for solving distributed network programming considering distributed generation is proposed in this work. The IQAFSA based on quantum computing which has exponential acceleration for heuristic algorithm uses quantum bits to code artificial fish and quantum revolving gate, preying behavior, and following behavior and variation of quantum artificial fish to update the artificial fish for searching for optimal value. Then, we apply the proposed new algorithm, the quantum artificial fish swarm algorithm (QAFSA), the basic artificial fish swarm algorithm (BAFSA), and the global edition artificial fish swarm algorithm (GAFSA) to the simulation experiments for some typical test functions, respectively. The simulation results demonstrate that the proposed algorithm can escape from the local extremum effectively and has higher convergence speed and better accuracy. Finally, applying IQAFSA to distributed network problems and the simulation results for 33-bus radial distribution network system show that IQAFSA can get the minimum power loss after comparing with BAFSA, GAFSA, and QAFSA. PMID:26447713
Distributed-memory Parallel Algorithms for Matching and Coloring
Catalyurek, Umit; Dobrian, Florin; Gebremedhin, Assefaw H.; Halappanavar, Mahantesh; Pothen, Alex
2011-05-31
Graph matching and coloring constitute two fundamental classes of combinatorial problems having numerous established as well as emerging applications in computational science and engineering, high-performance computing, and informatics. We provide a snapshot of an on-going work on the design and implementation of new highly-scalable distributed-memory parallel algorithms for two prototypical problems from these classes, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency and poor data locality, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including approximation, speculation and iteration, optimized communication, and randomization, in concert. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.
Durrell, Patrick R.; Accetta, Katharine; Côté, Patrick; Blakeslee, John P.; Ferrarese, Laura; McConnachie, Alan; Gwyn, Stephen; Peng, Eric W.; Zhang, Hongxin; Mihos, J. Christopher; Puzia, Thomas H.; Jordán, Andrés; Lançon, Ariane; Liu, Chengze; Cuillandre, Jean-Charles; Boissier, Samuel; Boselli, Alessandro; Courteau, Stéphane; Duc, Pierre-Alain; and others
2014-10-20
We report on a large-scale study of the distribution of globular clusters (GCs) throughout the Virgo cluster, based on photometry from the Next Generation Virgo Cluster Survey (NGVS), a large imaging survey covering Virgo's primary subclusters (Virgo A = M87 and Virgo B = M49) out to their virial radii. Using the g{sub o}{sup ′}, (g' – i') {sub o} color-magnitude diagram of unresolved and marginally resolved sources within the NGVS, we have constructed two-dimensional maps of the (irregular) GC distribution over 100 deg{sup 2} to a depth of g{sub o}{sup ′} = 24. We present the clearest evidence to date showing the difference in concentration between red and blue GCs over the full extent of the cluster, where the red (more metal-rich) GCs are largely located around the massive early-type galaxies in Virgo, while the blue (metal-poor) GCs have a much more extended spatial distribution with significant populations still present beyond 83' (∼215 kpc) along the major axes of both M49 and M87. A comparison of our GC maps to the diffuse light in the outermost regions of M49 and M87 show remarkable agreement in the shape, ellipticity, and boxiness of both luminous systems. We also find evidence for spatial enhancements of GCs surrounding M87 that may be indicative of recent interactions or an ongoing merger history. We compare the GC map to that of the locations of Virgo galaxies and the X-ray intracluster gas, and find generally good agreement between these various baryonic structures. We calculate the Virgo cluster contains a total population of N {sub GC} = 67, 300 ± 14, 400, of which 35% are located in M87 and M49 alone. For the first time, we compute a cluster-wide specific frequency S {sub N,} {sub CL} = 2.8 ± 0.7, after correcting for Virgo's diffuse light. We also find a GC-to-baryonic mass fraction ε {sub b} = 5.7 ± 1.1 × 10{sup –4} and a GC-to-total cluster mass formation efficiency ε {sub t} = 2.9 ± 0.5 × 10{sup –5}, the latter values
Muster: Massively Scalable Clustering
Energy Science and Technology Software Center (ESTSC)
2010-05-20
Muster is a framework for scalable cluster analysis. It includes implementations of classic K-Medoids partitioning algorithms, as well as infrastructure for making these algorithms run scalably on very large systems. In particular, Muster contains algorithms such as CAPEK (described in reference 1) that are capable of clustering highly distributed data sets in-place on a hundred thousand or more processes.
The mass distribution of the Fornax dSph: constraints from its globular cluster distribution
NASA Astrophysics Data System (ADS)
Cole, David R.; Dehnen, Walter; Read, Justin I.; Wilkinson, Mark I.
2012-10-01
Uniquely among the dwarf spheroidal (dSph) satellite galaxies of the Milky Way, Fornax hosts globular clusters. It remains a puzzle as to why dynamical friction has not yet dragged any of Fornax's five globular clusters to the centre, and also why there is no evidence that any similar star cluster has been in the past (for Fornax or any other tidally undisrupted dSph). We set up a suite of 2800 N-body simulations that sample the full range of globular cluster orbits and mass models consistent with all existing observational constraints for Fornax. In agreement with previous work, we find that if Fornax has a large dark matter core, then its globular clusters remain close to their currently observed locations for long times. Furthermore, we find previously unreported behaviour for clusters that start inside the core region. These are pushed out of the core and gain orbital energy, a process we call 'dynamical buoyancy'. Thus, a cored mass distribution in Fornax will naturally lead to a shell-like globular cluster distribution near the core radius, independent of the initial conditions. By contrast, cold dark matter-type cusped mass distributions lead to the rapid infall of at least one cluster within Δt = 1-2 Gyr, except when picking unlikely initial conditions for the cluster orbits (˜2 per cent probability), and almost all clusters within Δt = 10 Gyr. Alternatively, if Fornax has only a weakly cusped mass distribution, then dynamical friction is much reduced. While over Δt = 10 Gyr this still leads to the infall of one to four clusters from their present orbits, the infall of any cluster within Δt = 1-2 Gyr is much less likely (with probability 0-70 per cent, depending on Δt and the strength of the cusp). Such a solution to the timing problem requires (in addition to a shallow dark matter cusp) that in the past the globular clusters were somewhat further from Fornax than today; they most likely did not form within Fornax, but were accreted.
Improving permafrost distribution modelling using feature selection algorithms
NASA Astrophysics Data System (ADS)
Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail
2016-04-01
The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its
Solvation Effects on Structure and Charge Distribution in Anionic Clusters
NASA Astrophysics Data System (ADS)
Weber, J. Mathias
2015-03-01
The interaction of ions with solvent molecules modifies the properties of both solvent and solute. Solvation generally stabilizes compact charge distributions compared to more diffuse ones. In the most extreme cases, solvation will alter the very composition of the ion itself. We use infrared photodissociation spectroscopy of mass-selected ions to probe how solvation affects the structures and charge distributions of metal-CO2 cluster anions. We gratefully acknowledge the National Science Foundation for funding through Grant CHE-0845618 (for graduate student support) and for instrumentation funding through Grant PHY-1125844.
Optimizing scheduling problem using an estimation of distribution algorithm and genetic algorithm
NASA Astrophysics Data System (ADS)
Qun, Jiang; Yang, Ou; Dong, Shi-Du
2007-12-01
This paper presents a methodology for using heuristic search methods to optimize scheduling problem. Specifically, an Estimation of Distribution Algorithm (EDA)- Population Based Incremental Learning (PBIL), and Genetic Algorithm (GA) have been applied to finding effective arrangement of curriculum schedule of Universities. To our knowledge, EDAs have been applied to fewer real world problems compared to GAs, and the goal of the present paper is to expand the application domain of this technique. The experimental results indicate a good applicability of PBIL to optimize scheduling problem.
An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information
Li, Ao; Tuck, David
2009-01-01
Motivation Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV) as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS) is introduced to automatically determine the boundary threshold. Results Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations. PMID:19838334
A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images
NASA Astrophysics Data System (ADS)
Nascimento, Susana; Casca, Sérgio; Mirkin, Boris
2015-12-01
In this paper a novel clustering algorithm is proposed as a version of the seeded region growing (SRG) approach for the automatic recognition of coastal upwelling from sea surface temperature (SST) images. The new algorithm, one seed expanding cluster (SEC), takes advantage of the concept of approximate clustering due to Mirkin (1996, 2013) to derive a homogeneity criterion in the format of a product rather than the conventional difference between a pixel value and the mean of values over the region of interest. It involves a boundary-oriented pixel labeling so that the cluster growing is performed by expanding its boundary iteratively. The starting point is a cluster consisting of just one seed, the pixel with the coldest temperature. The baseline version of the SEC algorithm uses Otsu's thresholding method to fine-tune the homogeneity threshold. Unfortunately, this method does not always lead to a satisfactory solution. Therefore, we introduce a self-tuning version of the algorithm in which the homogeneity threshold is locally derived from the approximation criterion over a window around the pixel under consideration. The window serves as a boundary regularizer. These two unsupervised versions of the algorithm have been applied to a set of 28 SST images of the western coast of mainland Portugal, and compared against a supervised version fine-tuned by maximizing the F-measure with respect to manually labeled ground-truth maps. The areas built by the unsupervised versions of the SEC algorithm are significantly coincident over the ground-truth regions in the cases at which the upwelling areas consist of a single continuous fragment of the SST map.
A distributed geo-routing algorithm for wireless sensor networks.
Joshi, Gyanendra Prasad; Kim, Sung Won
2009-01-01
Geographic wireless sensor networks use position information for greedy routing. Greedy routing works well in dense networks, whereas in sparse networks it may fail and require a recovery algorithm. Recovery algorithms help the packet to get out of the communication void. However, these algorithms are generally costly for resource constrained position-based wireless sensor networks (WSNs). In this paper, we propose a void avoidance algorithm (VAA), a novel idea based on upgrading virtual distance. VAA allows wireless sensor nodes to remove all stuck nodes by transforming the routing graph and forwarding packets using only greedy routing. In VAA, the stuck node upgrades distance unless it finds a next hop node that is closer to the destination than it is. VAA guarantees packet delivery if there is a topologically valid path. Further, it is completely distributed, immediately responds to node failure or topology changes and does not require planarization of the network. NS-2 is used to evaluate the performance and correctness of VAA and we compare its performance to other protocols. Simulations show our proposed algorithm consumes less energy, has an efficient path and substantially less control overheads. PMID:22408514
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Dark matter distribution in the merging cluster Abell 2163
NASA Astrophysics Data System (ADS)
Soucail, G.
2012-04-01
Context. The cluster Abell 2163 is a merging system of several subclusters with complex dynamics. It presents exceptional X-rays properties (high temperature and luminosity), suggesting that it is a very massive cluster. Recent 2D analysis of the gas distribution has revealed a complex and multiphase structure. Aims: This paper presents a wide-field weak lensing study of the dark matter distribution in the cluster in order to provide an alternative vision of the merging status of the cluster. The 2D mass distribution was built and compared to the galaxies and gas distributions. Methods: A Bayesian method, implemented in the Im2shape software, was used to fit the shape parameters of the faint background galaxies and to correct for PSF smearing. A careful color selection on the background galaxies was applied to retrieve the weak lensing signal. Shear signal was measured out to more than 2 Mpc (≃12' from the center). The radial shear profile was fit with different parametric mass profiles. The 2D mass map was built from the shear distribution and used to identify the different mass components. Results: The 2D mass map agrees with the galaxy distribution, while the total mass inferred from weak lensing shows a strong discrepancy to the X-ray deduced mass. Regardless of the method used, the virial mass M200 falls in the range 8 to 14 × 1014 h70-1 M⊙, a value that is two times less than the mass deduced from X-rays. The central mass clump appears bimodal in the dark matter distribution, with a mass ratio ~3:1 between the two components. The infalling clump A2163-B is detected in weak lensing as an independent entity. All these results are interpreted in the context of a multiple merger seen less than 1 Gyr after the main crossover. Based on observations obtained with MegaPrime/MegaCam, a joint project of Canada-France-Hawaii Telescope (CFHT) and CEA/DAPNIA, at the Canada-France-Hawaii Telescope (CFHT) which is operated by the National Research Council (NRC) of
Spitzer IR Colors and ISM Distributions of Virgo Cluster Spirals
NASA Astrophysics Data System (ADS)
Kenney, Jeffrey D.; Wong, I.; Kenney, Z.; Murphy, E.; Helou, G.; Howell, J.
2012-01-01
IRAC infrared images of 44 spiral and peculiar galaxies from the Spitzer Survey of the Virgo Cluster help reveal the interactions which transform galaxies in clusters. We explore how the location of galaxies in the IR 3.6-8μm color-magnitude diagram is related to the spatial distributions of ISM/star formation, as traced by PAH emission in the 8μm band. Based on their 8μm/PAH radial distributions, we divide the galaxies into 4 groups: normal, truncated, truncated/compact, and anemic. Normal galaxies have relatively normal PAH distributions. They are the "bluest" galaxies, with the largest 8/3.6μm ratios. They are relatively unaffected by the cluster environment, and have probably never passed through the cluster core. Truncated galaxies have a relatively normal 8μm/PAH surface brightness in the inner disk, but are abruptly truncated with little or no emission in the outer disk. They have intermediate ("green") colors, while those which are more severely truncated are "redder". Most truncated galaxies have undisturbed stellar disks and many show direct evidence of active ram pressure stripping. Truncated/compact galaxies have high 8μm/PAH surface brightness in the very inner disk (central 1 kpc) but are abruptly truncated close to center with little or no emission in the outer disk. They have intermediate global colors, similar to the other truncated galaxies. While they have the most extreme ISM truncation, they have vigorous circumnuclear star formation. Most of these have disturbed stellar disks, and they are probably produced by a combination of gravitational interaction plus ram pressure stripping. Anemic galaxies have a low 8μm/PAH surface brightness even in the inner disk. These are the "reddest" galaxies, with the smallest 8/3.6μm ratios. The origin of the anemics seems to a combination of starvation, gravitational interactions, and long-ago ram pressure stripping.
A matching algorithm for the distribution of human pancreatic islets
Qian, Dajun; Kaddis, John; Niland, Joyce C.
2011-01-01
The success of human pancreatic islet transplantation in a subset of type 1 diabetic patients has led to an increased demand for this tissue in both clinical and basic research, yet the availability of such preparations is limited and the quality highly variable. Under the current process of islet distribution for basic science experimentation nationwide, specialized laboratories attempt to distribute islets to one or more scientists based on a list of known investigators. This Local Decision Making (LDM) process has been found to be ineffective and suboptimal. To alleviate these problems, a computerized Matching Algorithm for Islet Distribution (MAID) was developed to better match the functional, morphological, and quality characteristics of islet preparations to the criteria desired by basic research laboratories, i.e. requesters. The algorithm searches for an optimal combination of requesters using detailed screening, sorting, and search procedures. When applied to a data set of 68 human islet preparations distributed by the Islet Cell Resource (ICR) Center Consortium, MAID reduced the number of requesters that a) did not receive any islets, and b) received mis-matched shipments. These results suggest that MAID is an improved more efficient approach to the centralized distribution of human islets within a consortium setting. PMID:22199413
A new distribution vector and its application in genome clustering.
Zhao, Bo; He, Rong L; Yau, Stephen S-T
2011-05-01
In this paper we report a novel mathematical method to transform the DNA sequences into the distribution vectors which correspond to points in the sixty dimensional space. Each component of the distribution vector represents the distribution of one kind of nucleotide in k segments of the DNA sequences. The mathematical and statistical properties of the distribution vectors are demonstrated and examined with huge datasets of human DNA sequences and random sequences. The determined expectation and standard deviation can make the mapping stable and practicable. Moreover, we apply the distribution vectors to the clustering of the Haemagglutinin (HA) gene of 60 H1N1 viruses from Human, Swine and Avian, the complete mitochondrial genomes from 80 placental mammals and the complete genomes from 50 bacteria. The 60 H1N1 viruses, 80 placental mammals and 50 bacteria are classified accurately and rapidly compared to the multiple sequence alignment methods. The results indicate that the distribution vectors can reveal the similarity and evolutionary relationship among homologous DNA sequences based on the distances between any two of these distribution vectors. The advantage of fast computation offers the distribution vectors the opportunity to deal with a huge amount of DNA sequences efficiently. PMID:21385621
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
Borodovsky, M; Peresetsky, A
1994-09-01
Non-homogeneous Markov chain models can represent biologically important regions of DNA sequences. The statistical pattern that is described by these models is usually weak and was found primarily because of strong biological indications. The general method for extracting similar patterns is presented in the current paper. The algorithm incorporates cluster analysis, multiple alignment and entropy minimization. The method was first tested using the set of DNA sequences produced by Markov chain generators. It was shown that artificial gene sequences, which initially have been randomly set up along the multiple alignment panels, are aligned according to the hidden triplet phase. Then the method was applied to real protein-coding sequences and the resulting alignment clearly indicated the triplet phase and produced the parameters of the optimal 3-periodic non-homogeneous Markov chain model. These Markov models were already employed in the GeneMark gene prediction algorithm, which is used in genome sequencing projects. The algorithm can also handle the case in which the sequences to be aligned reveal different statistical patterns, such as Escherichia coli protein-coding sequences belonging to Class II and Class III. The algorithm accepts a random mix of sequences from different classes, and is able to separate them into two groups (clusters), align each cluster separately, and define a non-homogeneous Markov chain model for each sequence cluster. PMID:7952897
Solving the depth of the repeated texture areas based on the clustering algorithm
NASA Astrophysics Data System (ADS)
Xiong, Zhang; Zhang, Jun; Tian, Jinwen
2015-12-01
The reconstruction of the 3D scene in the monocular stereo vision needs to get the depth of the field scenic points in the picture scene. But there will inevitably be error matching in the process of image matching, especially when there are a large number of repeat texture areas in the images, there will be lots of error matches. At present, multiple baseline stereo imaging algorithm is commonly used to eliminate matching error for repeated texture areas. This algorithm can eliminate the ambiguity correspond to common repetition texture. But this algorithm has restrictions on the baseline, and has low speed. In this paper, we put forward an algorithm of calculating the depth of the matching points in the repeat texture areas based on the clustering algorithm. Firstly, we adopt Gauss Filter to preprocess the images. Secondly, we segment the repeated texture regions in the images into image blocks by using spectral clustering segmentation algorithm based on super pixel and tag the image blocks. Then, match the two images and solve the depth of the image. Finally, the depth of the image blocks takes the median in all depth values of calculating point in the bock. So the depth of repeated texture areas is got. The results of a lot of image experiments show that the effect of our algorithm for calculating the depth of repeated texture areas is very good.
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION
WANG, QINGGUO; SHANG, CHARLES; XU, DONG
2014-01-01
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
NEW MDS AND CLUSTERING BASED ALGORITHMS FOR PROTEIN MODEL QUALITY ASSESSMENT AND SELECTION.
Wang, Qingguo; Shang, Charles; Xu, Dong; Shang, Yi
2013-10-25
In protein tertiary structure prediction, assessing the quality of predicted models is an essential task. Over the past years, many methods have been proposed for the protein model quality assessment (QA) and selection problem. Despite significant advances, the discerning power of current methods is still unsatisfactory. In this paper, we propose two new algorithms, CC-Select and MDS-QA, based on multidimensional scaling and k-means clustering. For the model selection problem, CC-Select combines consensus with clustering techniques to select the best models from a given pool. Given a set of predicted models, CC-Select first calculates a consensus score for each structure based on its average pairwise structural similarity to other models. Then, similar structures are grouped into clusters using multidimensional scaling and clustering algorithms. In each cluster, the one with the highest consensus score is selected as a candidate model. For the QA problem, MDS-QA combines single-model scoring functions with consensus to determine more accurate assessment score for every model in a given pool. Using extensive benchmark sets of a large collection of predicted models, we compare the two algorithms with existing state-of-the-art quality assessment methods and show significant improvement. PMID:24808625
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
The Extended Spatial Distribution of Globular Clusters in the Core of the Fornax Cluster
NASA Astrophysics Data System (ADS)
D'Abrusco, R.; Cantiello, M.; Paolillo, M.; Pota, V.; Napolitano, N. R.; Limatola, L.; Spavone, M.; Grado, A.; Iodice, E.; Capaccioli, M.; Peletier, R.; Longo, G.; Hilker, M.; Mieske, S.; Grebel, E. K.; Lisker, T.; Wittmann, C.; van de Ven, G.; Schipani, P.; Fabbiano, G.
2016-03-01
We report the discovery of a complex extended density enhancement in the Globular Clusters (GCs) in the central ˜ 0.5{(^\\circ )}2 (˜ 0.06 Mpc2) of the Fornax cluster, corresponding to ˜ 50% of the area within 1 core radius. This overdensity connects the GC system of NGC 1399 to most of those of neighboring galaxies within ˜ 0\\_\\_AMP\\_\\_fdg;6 (˜ 210 kpc) along the W-E direction. The asymmetric density structure suggests that the galaxies in the core of the Fornax cluster experienced a lively history of interactions that have left a clear imprint on the spatial distribution of GCs. The extended central dominant structure is more prominent in the distribution of blue GCs, while red GCs show density enhancements that are more centrally concentrated on the host galaxies. We propose that the relatively small-scale density structures in the red GCs are caused by galaxy-galaxy interactions, while the extensive spatial distribution of blue GCs is due to stripping of GCs from the halos of core massive galaxies by the Fornax gravitational potential. Our investigations are based on density maps of candidate GCs extracted from the multi-band VLT Survey Telescope (VST) survey of Fornax (FDS), identified in a three-dimensional color space and further selected based on their g-band magnitude and morphology.
An X-Ray Spectral Classification Algorithm with Application to Young Stellar Clusters
NASA Astrophysics Data System (ADS)
Hojnacki, S. M.; Kastner, J. H.; Micela, G.; Feigelson, E. D.; LaLonde, S. M.
2007-04-01
A large volume of low signal-to-noise, multidimensional data is available from the CCD imaging spectrometers aboard the Chandra X-Ray Observatory and the X-Ray Multimirror Mission (XMM-Newton). To make progress analyzing this data, it is essential to develop methods to sort, classify, and characterize the vast library of X-ray spectra in a nonparametric fashion (complementary to current parametric model fits). We have developed a spectral classification algorithm that handles large volumes of data and operates independently of the requirement of spectral model fits. We use proven multivariate statistical techniques including principal component analysis and an ensemble classifier consisting of agglomerative hierarchical clustering and K-means clustering applied for the first time for spectral classification. The algorithm positions the sources in a multidimensional spectral sequence and then groups the ordered sources into clusters based on their spectra. These clusters appear more distinct for sources with harder observed spectra. The apparent diversity of source spectra is reduced to a three-dimensional locus in principal component space, with spectral outliers falling outside this locus. The algorithm was applied to a sample of 444 strong sources selected from the 1616 X-ray emitting sources detected in deep Chandra imaging spectroscopy of the Orion Nebula Cluster. Classes form sequences in NH, AV, and accretion activity indicators, demonstrating that the algorithm efficiently sorts the X-ray sources into a physically meaningful sequence. The algorithm also isolates important classes of very deeply embedded, active young stellar objects, and yields trends between X-ray spectral parameters and stellar parameters for the lowest mass, pre-main-sequence stars.
Belief-propagation algorithm and the Ising model on networks with arbitrary distributions of motifs
NASA Astrophysics Data System (ADS)
Yoon, S.; Goltsev, A. V.; Dorogovtsev, S. N.; Mendes, J. F. F.
2011-10-01
We generalize the belief-propagation algorithm to sparse random networks with arbitrary distributions of motifs (triangles, loops, etc.). Each vertex in these networks belongs to a given set of motifs (generalization of the configuration model). These networks can be treated as sparse uncorrelated hypergraphs in which hyperedges represent motifs. Here a hypergraph is a generalization of a graph, where a hyperedge can connect any number of vertices. These uncorrelated hypergraphs are treelike (hypertrees), which crucially simplifies the problem and allows us to apply the belief-propagation algorithm to these loopy networks with arbitrary motifs. As natural examples, we consider motifs in the form of finite loops and cliques. We apply the belief-propagation algorithm to the ferromagnetic Ising model with pairwise interactions on the resulting random networks and obtain an exact solution of this model. We find an exact critical temperature of the ferromagnetic phase transition and demonstrate that with increasing the clustering coefficient and the loop size, the critical temperature increases compared to ordinary treelike complex networks. However, weak clustering does not change the critical behavior qualitatively. Our solution also gives the birth point of the giant connected component in these loopy networks.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules. PMID:27258041
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database
Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; Shephard, Mark S.
2013-01-01
Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order inmore » a 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
The RedGOLD cluster detection algorithm and its cluster candidate catalogue for the CFHT-LS W1
NASA Astrophysics Data System (ADS)
Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik
2016-01-01
We present RedGOLD (Red-sequence Galaxy Overdensity cLuster Detector), a new optical/NIR galaxy cluster detection algorithm, and apply it to the CFHT-LS W1 field. RedGOLD searches for red-sequence galaxy overdensities while minimizing contamination from dusty star-forming galaxies. It imposes an Navarro-Frenk-White profile and calculates cluster detection significance and richness. We optimize these latter two parameters using both simulations and X-ray-detected cluster catalogues, and obtain a catalogue ˜80 per cent pure up to z ˜ 1, and ˜100 per cent (˜70 per cent) complete at z ≤ 0.6 (z ≲ 1) for galaxy clusters with M ≳ 1014 M⊙ at the CFHT-LS Wide depth. In the CFHT-LS W1, we detect 11 cluster candidates per deg2 out to z ˜ 1.1. When we optimize both completeness and purity, RedGOLD obtains a cluster catalogue with higher completeness and purity than other public catalogues, obtained using CFHT-LS W1 observations, for M ≳ 1014 M⊙. We use X-ray-detected cluster samples to extend the study of the X-ray temperature-optical richness relation to a lower mass threshold, and find a mass scatter at fixed richness of σlnM|λ = 0.39 ± 0.07 and σlnM|λ = 0.30 ± 0.13 for the Gozaliasl et al. and Mehrtens et al. samples. When considering similar mass ranges as previous work, we recover a smaller scatter in mass at fixed richness. We recover 93 per cent of the redMaPPer detections, and find that its richness estimates is on average ˜40-50 per cent larger than ours at z > 0.3. RedGOLD recovers X-ray cluster spectroscopic redshifts at better than 5 per cent up to z ˜ 1, and the centres within a few tens of arcseconds.
NASA Astrophysics Data System (ADS)
Rajalakshmi, N.; Padma Subramanian, D.; Thamizhavel, K.
2015-03-01
The extent of real power loss and voltage deviation associated with overloaded feeders in radial distribution system can be reduced by reconfiguration. Reconfiguration is normally achieved by changing the open/closed state of tie/sectionalizing switches. Finding optimal switch combination is a complicated problem as there are many switching combinations possible in a distribution system. Hence optimization techniques are finding greater importance in reducing the complexity of reconfiguration problem. This paper presents the application of firefly algorithm (FA) for optimal reconfiguration of radial distribution system with distributed generators (DG). The algorithm is tested on IEEE 33 bus system installed with DGs and the results are compared with binary genetic algorithm. It is found that binary FA is more effective than binary genetic algorithm in achieving real power loss reduction and improving voltage profile and hence enhancing the performance of radial distribution system. Results are found to be optimum when DGs are added to the test system, which proved the impact of DGs on distribution system.
Distributed Minimal Residual (DMR) method for acceleration of iterative algorithms
NASA Technical Reports Server (NTRS)
Lee, Seungsoo; Dulikravich, George S.
1991-01-01
A new method for enhancing the convergence rate of iterative algorithms for the numerical integration of systems of partial differential equations was developed. It is termed the Distributed Minimal Residual (DMR) method and it is based on general Krylov subspace methods. The DMR method differs from the Krylov subspace methods by the fact that the iterative acceleration factors are different from equation to equation in the system. At the same time, the DMR method can be viewed as an incomplete Newton iteration method. The DMR method was applied to Euler equations of gas dynamics and incompressible Navier-Stokes equations. All numerical test cases were obtained using either explicit four stage Runge-Kutta or Euler implicit time integration. The formulation for the DMR method is general in nature and can be applied to explicit and implicit iterative algorithms for arbitrary systems of partial differential equations.
FctClus: A Fast Clustering Algorithm for Heterogeneous Information Networks.
Yang, Jing; Chen, Limin; Zhang, Jianpei
2015-01-01
It is important to cluster heterogeneous information networks. A fast clustering algorithm based on an approximate commute time embedding for heterogeneous information networks with a star network schema is proposed in this paper by utilizing the sparsity of heterogeneous information networks. First, a heterogeneous information network is transformed into multiple compatible bipartite graphs from the compatible point of view. Second, the approximate commute time embedding of each bipartite graph is computed using random mapping and a linear time solver. All of the indicator subsets in each embedding simultaneously determine the target dataset. Finally, a general model is formulated by these indicator subsets, and a fast algorithm is derived by simultaneously clustering all of the indicator subsets using the sum of the weighted distances for all indicators for an identical target object. The proposed fast algorithm, FctClus, is shown to be efficient and generalizable and exhibits high clustering accuracy and fast computation speed based on a theoretic analysis and experimental verification. PMID:26090857
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Li, Xiumin; Zhong, Ling; Xue, Fangzheng; Zhang, Anguo
2015-01-01
Echo state networks (ESNs) with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision. PMID:25875296
Raindrop Size Distribution Observation for GPM/DPR algorithm development
NASA Astrophysics Data System (ADS)
Nakagawa, Katsuhiro; Hanado, Hiroshi; Nishikawa, Masanori; Nakamura, Kenji; Kaneko, Yuki; Kawamura, Seiji; Iwai, Hironori; Minda, Haruya; Oki, Riko
2013-04-01
In order to evaluate and improve the accuracy of rainfall intensity from space-borne radars (TRMM/PR and GPM/DPR), it is important to estimate the rain attenuation, namely the k-Z relationship (k is the specific attenuation, Z is the radar reflectivity) correctly. National Institute of Information and Communications Technology (NICT) developed the mobile precipitation observation system for the dual Ka-band radar field campaign for GPM/DPR algorithm development. The precipitation measurement instruments are installed on the roof of container. The installed instruments for raindrop size distribution (DSD) measurements are 2-dimensional Video disdtrometer (2DVD), Joss-type disdrometer, and Laser Optical disdrometr (Parsival). 2DVD and Persival can measure not only raindrop size distribution but also ice and snow size distribution. Observations using the mobile precipitation observation system were performed in Okinawa Island, in Tsukuba, over the slope of Mt. Fuji, in Nagaoka, and in Sapporo Japan. Using these observed DSD data in the different provinces, the characteristics of DSD itself are analyzed and the k-Z relationship is estimated for evaluation and improvement of the TRMM/PR and GPM/DPR algorithm.