Abelian non-global logarithms from soft gluon clustering
NASA Astrophysics Data System (ADS)
Kelley, Randall; Walsh, Jonathan R.; Zuberi, Saba
2012-09-01
Most recombination-style jet algorithms cluster soft gluons in a complex way. This leads to previously identified correlations in the soft gluon phase space and introduces logarithmic corrections to jet cross sections, which are known as clustering logarithms. The leading Abelian clustering logarithms occur at least at next-to leading logarithm (NLL) in the exponent of the distribution. Using the framework of Soft Collinear Effective Theory (SCET), we show that new clustering effects contributing at NLL arise at each order. While numerical resummation of clustering logs is possible, it is unlikely that they can be analytically resummed to NLL. Clustering logarithms make the anti-kT algorithm theoretically preferred, for which they are power suppressed. They can arise in Abelian and non-Abelian terms, and we calculate the Abelian clustering logarithms at O ( {α_s^2} ) for the jet mass distribution using the Cambridge/Aachen and kT algorithms, including jet radius dependence, which extends previous results. We find that clustering logarithms can be naturally thought of as a class of non-global logarithms, which have traditionally been tied to non-Abelian correlations in soft gluon emission.
Karayiannis, N B
2000-01-01
This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.
Clustering with Missing Values: No Imputation Required
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri
2004-01-01
Clustering algorithms can identify groups in large data sets, such as star catalogs and hyperspectral images. In general, clustering methods cannot analyze items that have missing data values. Common solutions either fill in the missing values (imputation) or ignore the missing data (marginalization). Imputed values are treated as just as reliable as the truly observed data, but they are only as good as the assumptions used to create them. In contrast, we present a method for encoding partially observed features as a set of supplemental soft constraints and introduce the KSC algorithm, which incorporates constraints into the clustering process. In experiments on artificial data and data from the Sloan Digital Sky Survey, we show that soft constraints are an effective way to enable clustering with missing values.
Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z
2006-08-01
This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
DeMaere, Matthew Z.
2016-01-01
Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Tong; Gu, YuanTong, E-mail: yuantong.gu@qut.edu.au
As all-atom molecular dynamics method is limited by its enormous computational cost, various coarse-grained strategies have been developed to extend the length scale of soft matters in the modeling of mechanical behaviors. However, the classical thermostat algorithm in highly coarse-grained molecular dynamics method would underestimate the thermodynamic behaviors of soft matters (e.g. microfilaments in cells), which can weaken the ability of materials to overcome local energy traps in granular modeling. Based on all-atom molecular dynamics modeling of microfilament fragments (G-actin clusters), a new stochastic thermostat algorithm is developed to retain the representation of thermodynamic properties of microfilaments at extra coarse-grainedmore » level. The accuracy of this stochastic thermostat algorithm is validated by all-atom MD simulation. This new stochastic thermostat algorithm provides an efficient way to investigate the thermomechanical properties of large-scale soft matters.« less
Elastic K-means using posterior probability.
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Clustering by soft-constraint affinity propagation: applications to gene-expression data.
Leone, Michele; Sumedha; Weigt, Martin
2007-10-15
Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
Elastic K-means using posterior probability
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model. PMID:29240756
Use of DAVID algorithms for gene functional classification in a non-model organism, rainbow trout
USDA-ARS?s Scientific Manuscript database
Gene functional clustering is essential in transcriptome data analysis but software programs are not always suitable for use with non-model species. The DAVID Gene Functional Classification Tool has been widely used for soft clustering in model species, but requires adaptations for use in non-model ...
A New Soft Computing Method for K-Harmonic Means Clustering.
Yeh, Wei-Chang; Jiang, Yunzhi; Chen, Yee-Fen; Chen, Zhe
2016-01-01
The K-harmonic means clustering algorithm (KHM) is a new clustering method used to group data such that the sum of the harmonic averages of the distances between each entity and all cluster centroids is minimized. Because it is less sensitive to initialization than K-means (KM), many researchers have recently been attracted to studying KHM. In this study, the proposed iSSO-KHM is based on an improved simplified swarm optimization (iSSO) and integrates a variable neighborhood search (VNS) for KHM clustering. As evidence of the utility of the proposed iSSO-KHM, we present extensive computational results on eight benchmark problems. From the computational results, the comparison appears to support the superiority of the proposed iSSO-KHM over previously developed algorithms for all experiments in the literature.
A program to compute the soft Robinson-Foulds distance between phylogenetic networks.
Lu, Bingxin; Zhang, Louxin; Leong, Hon Wai
2017-03-14
Over the past two decades, phylogenetic networks have been studied to model reticulate evolutionary events. The relationships among phylogenetic networks, phylogenetic trees and clusters serve as the basis for reconstruction and comparison of phylogenetic networks. To understand these relationships, two problems are raised: the tree containment problem, which asks whether a phylogenetic tree is displayed in a phylogenetic network, and the cluster containment problem, which asks whether a cluster is represented at a node in a phylogenetic network. Both the problems are NP-complete. A fast exponential-time algorithm for the cluster containment problem on arbitrary networks is developed and implemented in C. The resulting program is further extended into a computer program for fast computation of the Soft Robinson-Foulds distance between phylogenetic networks. Two computer programs are developed for facilitating reconstruction and validation of phylogenetic network models in evolutionary and comparative genomics. Our simulation tests indicated that they are fast enough for use in practice. Additionally, the distribution of the Soft Robinson-Foulds distance between phylogenetic networks is demonstrated to be unlikely normal by our simulation data.
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Deterministic annealing for density estimation by multivariate normal mixtures
NASA Astrophysics Data System (ADS)
Kloppenburg, Martin; Tavan, Paul
1997-03-01
An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
An ensemble framework for clustering protein-protein interaction networks.
Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan
2007-07-01
Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.
Segmentation of bone and soft tissue regions in digital radiographic images of extremities
NASA Astrophysics Data System (ADS)
Pakin, S. Kubilay; Gaborski, Roger S.; Barski, Lori L.; Foos, David H.; Parker, Kevin J.
2001-07-01
This paper presents an algorithm for segmentation of computed radiography (CR) images of extremities into bone and soft tissue regions. The algorithm is a region-based one in which the regions are constructed using a growing procedure with two different statistical tests. Following the growing process, tissue classification procedure is employed. The purpose of the classification is to label each region as either bone or soft tissue. This binary classification goal is achieved by using a voting procedure that consists of clustering of regions in each neighborhood system into two classes. The voting procedure provides a crucial compromise between local and global analysis of the image, which is necessary due to strong exposure variations seen on the imaging plate. Also, the existence of regions whose size is large enough such that exposure variations can be observed through them makes it necessary to use overlapping blocks during the classification. After the classification step, resulting bone and soft tissue regions are refined by fitting a 2nd order surface to each tissue, and reevaluating the label of each region according to the distance between the region and surfaces. The performance of the algorithm is tested on a variety of extremity images using manually segmented images as gold standard. The experiments showed that our algorithm provided a bone boundary with an average area overlap of 90% compared to the gold standard.
Making the most of missing values : object clustering with partial data in astronomy
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Laidler, Victoria G.
2004-01-01
We demonstrate a clustering analysis algorithm, KSC, that a) uses all observed values and b) does not discard the partially observed objects. KSC uses soft constraints defined by the fully observed objects to assist in the grouping of objects with missing values. We present an analysis of objects taken from the Sloan Digital Sky Survey to demonstrate how imputing the values can be misleading and why the KSC approach can produce more appropriate results.
NASA Astrophysics Data System (ADS)
Karagiannis, Georgios Th.
2016-04-01
The development of non-destructive techniques is a reality in the field of conservation science. These techniques are usually not so accurate, as the analytical micro-sampling techniques, however, the proper development of soft-computing techniques can improve their accuracy. In this work, we propose a real-time fast acquisition spectroscopic mapping imaging system that operates from the ultraviolet to mid infrared (UV/Vis/nIR/mIR) area of the electromagnetic spectrum and it is supported by a set of soft-computing methods to identify the materials that exist in a stratigraphic structure of paint layers. Particularly, the system acquires spectra in diffuse-reflectance mode, scanning in a Region-Of-Interest (ROI), and having wavelength range from 200 up to 5000 nm. Also, a fuzzy c-means clustering algorithm, i.e., the particular soft-computing algorithm, produces the mapping images. The evaluation of the method was tested on a byzantine painted icon.
Algorithms development for the GEM-based detection system
NASA Astrophysics Data System (ADS)
Czarski, T.; Chernyshova, M.; Malinowski, K.; Pozniak, K. T.; Kasprowicz, G.; Kolasinski, P.; Krawczyk, R.; Wojenski, A.; Zabolotny, W.
2016-09-01
The measurement system based on GEM - Gas Electron Multiplier detector - is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an Xray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals and cluster charge values corresponding to the energy spectra.
A diabetic retinopathy detection method using an improved pillar K-means algorithm.
Gogula, Susmitha Valli; Divakar, Ch; Satyanarayana, Ch; Rao, Allam Appa
2014-01-01
The paper presents a new approach for medical image segmentation. Exudates are a visible sign of diabetic retinopathy that is the major reason of vision loss in patients with diabetes. If the exudates extend into the macular area, blindness may occur. Automated detection of exudates will assist ophthalmologists in early diagnosis. This segmentation process includes a new mechanism for clustering the elements of high-resolution images in order to improve precision and reduce computation time. The system applies K-means clustering to the image segmentation after getting optimized by Pillar algorithm; pillars are constructed in such a way that they can withstand the pressure. Improved pillar algorithm can optimize the K-means clustering for image segmentation in aspects of precision and computation time. This evaluates the proposed approach for image segmentation by comparing with Kmeans and Fuzzy C-means in a medical image. Using this method, identification of dark spot in the retina becomes easier and the proposed algorithm is applied on diabetic retinal images of all stages to identify hard and soft exudates, where the existing pillar K-means is more appropriate for brain MRI images. This proposed system help the doctors to identify the problem in the early stage and can suggest a better drug for preventing further retinal damage.
Random Walk Graph Laplacian-Based Smoothness Prior for Soft Decoding of JPEG Images.
Liu, Xianming; Cheung, Gene; Wu, Xiaolin; Zhao, Debin
2017-02-01
Given the prevalence of joint photographic experts group (JPEG) compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed discrete cosine transform (DCT) coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors-Laplacian prior for DCT coefficients, sparsity prior, and graph-signal smoothness prior for image patches-to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the overcomplete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared with the previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms the state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.
Microstructure-based modelling of arbitrary deformation histories of filler-reinforced elastomers
NASA Astrophysics Data System (ADS)
Lorenz, H.; Klüppel, M.
2012-11-01
A physically motivated theory of rubber reinforcement based on filler cluster mechanics is presented considering the mechanical behaviour of quasi-statically loaded elastomeric materials subjected to arbitrary deformation histories. This represents an extension of a previously introduced model describing filler induced stress softening and hysteresis of highly strained elastomers. These effects are referred to the hydrodynamic reinforcement of rubber elasticity due to strain amplification by stiff filler clusters and cyclic breakdown and re-aggregation (healing) of softer, already damaged filler clusters. The theory is first developed for the special case of outer stress-strain cycles with successively increasing maximum strain. In this more simple case, all soft clusters are broken at the turning points of the cycle and the mechanical energy stored in the strained clusters is completely dissipated, i.e. only irreversible stress contributions result. Nevertheless, the description of outer cycles involves already all material parameters of the theory and hence they can be used for a fitting procedure. In the general case of an arbitrary deformation history, the cluster mechanics of the material is complicated due to the fact that not all soft clusters are broken at the turning points of a cycle. For that reason additional reversible stress contributions considering the relaxation of clusters upon retraction have to be taken into account for the description of inner cycles. A special recursive algorithm is developed constituting a frame of the mechanical response of encapsulated inner cycles. Simulation and measurement are found to be in fair agreement for CB and silica filled SBR/BR and EPDM samples, loaded in compression and tension along various deformation histories.
Optimal Alignment of Structures for Finite and Periodic Systems.
Griffiths, Matthew; Niblett, Samuel P; Wales, David J
2017-10-10
Finding the optimal alignment between two structures is important for identifying the minimum root-mean-square distance (RMSD) between them and as a starting point for calculating pathways. Most current algorithms for aligning structures are stochastic, scale exponentially with the size of structure, and the performance can be unreliable. We present two complementary methods for aligning structures corresponding to isolated clusters of atoms and to condensed matter described by a periodic cubic supercell. The first method (Go-PERMDIST), a branch and bound algorithm, locates the global minimum RMSD deterministically in polynomial time. The run time increases for larger RMSDs. The second method (FASTOVERLAP) is a heuristic algorithm that aligns structures by finding the global maximum kernel correlation between them using fast Fourier transforms (FFTs) and fast SO(3) transforms (SOFTs). For periodic systems, FASTOVERLAP scales with the square of the number of identical atoms in the system, reliably finds the best alignment between structures that are not too distant, and shows significantly better performance than existing algorithms. The expected run time for Go-PERMDIST is longer than FASTOVERLAP for periodic systems. For finite clusters, the FASTOVERLAP algorithm is competitive with existing algorithms. The expected run time for Go-PERMDIST to find the global RMSD between two structures deterministically is generally longer than for existing stochastic algorithms. However, with an earlier exit condition, Go-PERMDIST exhibits similar or better performance.
Marcek, Dusan; Durisova, Maria
2016-01-01
This paper deals with application of quantitative soft computing prediction models into financial area as reliable and accurate prediction models can be very helpful in management decision-making process. The authors suggest a new hybrid neural network which is a combination of the standard RBF neural network, a genetic algorithm, and a moving average. The moving average is supposed to enhance the outputs of the network using the error part of the original neural network. Authors test the suggested model on high-frequency time series data of USD/CAD and examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, they perform a comparative statistical out-of-sample analysis of the tested model with autoregressive models and the standard neural network. They also incorporate genetic algorithm as an optimizing technique for adapting parameters of ANN which is then compared with standard backpropagation and backpropagation combined with K-means clustering algorithm. Finally, the authors find out that their suggested hybrid neural network is able to produce more accurate forecasts than the standard models and can be helpful in eliminating the risk of making the bad decision in decision-making process. PMID:26977450
Falat, Lukas; Marcek, Dusan; Durisova, Maria
2016-01-01
This paper deals with application of quantitative soft computing prediction models into financial area as reliable and accurate prediction models can be very helpful in management decision-making process. The authors suggest a new hybrid neural network which is a combination of the standard RBF neural network, a genetic algorithm, and a moving average. The moving average is supposed to enhance the outputs of the network using the error part of the original neural network. Authors test the suggested model on high-frequency time series data of USD/CAD and examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, they perform a comparative statistical out-of-sample analysis of the tested model with autoregressive models and the standard neural network. They also incorporate genetic algorithm as an optimizing technique for adapting parameters of ANN which is then compared with standard backpropagation and backpropagation combined with K-means clustering algorithm. Finally, the authors find out that their suggested hybrid neural network is able to produce more accurate forecasts than the standard models and can be helpful in eliminating the risk of making the bad decision in decision-making process.
Qin, Jiangyi; Huang, Zhiping; Liu, Chunwu; Su, Shaojing; Zhou, Jing
2015-01-01
A novel blind recognition algorithm of frame synchronization words is proposed to recognize the frame synchronization words parameters in digital communication systems. In this paper, a blind recognition method of frame synchronization words based on the hard-decision is deduced in detail. And the standards of parameter recognition are given. Comparing with the blind recognition based on the hard-decision, utilizing the soft-decision can improve the accuracy of blind recognition. Therefore, combining with the characteristics of Quadrature Phase Shift Keying (QPSK) signal, an improved blind recognition algorithm based on the soft-decision is proposed. Meanwhile, the improved algorithm can be extended to other signal modulation forms. Then, the complete blind recognition steps of the hard-decision algorithm and the soft-decision algorithm are given in detail. Finally, the simulation results show that both the hard-decision algorithm and the soft-decision algorithm can recognize the parameters of frame synchronization words blindly. What's more, the improved algorithm can enhance the accuracy of blind recognition obviously.
High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.
For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less
Clusters, Groups, and Filaments in the Chandra Deep Field-South up to Redshift 1
NASA Astrophysics Data System (ADS)
Dehghan, S.; Johnston-Hollitt, M.
2014-03-01
We present a comprehensive structure detection analysis of the 0.3 deg2 area of the MUSYC-ACES field, which covers the Chandra Deep Field-South (CDFS). Using a density-based clustering algorithm on the MUSYC and ACES photometric and spectroscopic catalogs, we find 62 overdense regions up to redshifts of 1, including clusters, groups, and filaments. We also present the detection of a relatively small void of ~10 Mpc2 at z ~ 0.53. All structures are confirmed using the DBSCAN method, including the detection of nine structures previously reported in the literature. We present a catalog of all structures present, including their central position, mean redshift, velocity dispersions, and classification based on their morphological and spectroscopic distributions. In particular, we find 13 galaxy clusters and 6 large groups/small clusters. Comparison of these massive structures with published XMM-Newton imaging (where available) shows that 80% of these structures are associated with diffuse, soft-band (0.4-1 keV) X-ray emission, including 90% of all objects classified as clusters. The presence of soft-band X-ray emission in these massive structures (M 200 >= 4.9 × 1013 M ⊙) provides a strong independent confirmation of our methodology and classification scheme. In the closest two clusters identified (z < 0.13) high-quality optical imaging from the Deep2c field of the Garching-Bonn Deep Survey reveals the cD galaxies and demonstrates that they sit at the center of the detected X-ray emission. Nearly 60% of the clusters, groups, and filaments are detected in the known enhanced density regions of the CDFS at z ~= 0.13, 0.52, 0.68, and 0.73. Additionally, all of the clusters, bar the most distant, are found in these overdense redshift regions. Many of the clusters and groups exhibit signs of ongoing formation seen in their velocity distributions, position within the detected cosmic web, and in one case through the presence of tidally disrupted central galaxies exhibiting trails of stars. These results all provide strong support for hierarchical structure formation up to redshifts of 1.
Yang, Yuan; Quan, Nannan; Bu, Jingjing; Li, Xueping; Yu, Ningmei
2016-09-26
High order modulation and demodulation technology can solve the frequency requirement between the wireless energy transmission and data communication. In order to achieve reliable wireless data communication based on high order modulation technology for visual prosthesis, this work proposed a Reed-Solomon (RS) error correcting code (ECC) circuit on the basis of differential amplitude and phase shift keying (DAPSK) soft demodulation. Firstly, recognizing the weakness of the traditional DAPSK soft demodulation algorithm based on division that is complex for hardware implementation, an improved phase soft demodulation algorithm for visual prosthesis to reduce the hardware complexity is put forward. Based on this new algorithm, an improved RS soft decoding method is hence proposed. In this new decoding method, the combination of Chase algorithm and hard decoding algorithms is used to achieve soft decoding. In order to meet the requirements of implantable visual prosthesis, the method to calculate reliability of symbol-level based on multiplication of bit reliability is derived, which reduces the testing vectors number of Chase algorithm. The proposed algorithms are verified by MATLAB simulation and FPGA experimental results. During MATLAB simulation, the biological channel attenuation property model is added into the ECC circuit. The data rate is 8 Mbps in the MATLAB simulation and FPGA experiments. MATLAB simulation results show that the improved phase soft demodulation algorithm proposed in this paper saves hardware resources without losing bit error rate (BER) performance. Compared with the traditional demodulation circuit, the coding gain of the ECC circuit has been improved by about 3 dB under the same BER of [Formula: see text]. The FPGA experimental results show that under the condition of data demodulation error with wireless coils 3 cm away, the system can correct it. The greater the distance, the higher the BER. Then we use a bit error rate analyzer to measure BER of the demodulation circuit and the RS ECC circuit with different distance of two coils. And the experimental results show that the RS ECC circuit has about an order of magnitude lower BER than the demodulation circuit when under the same coils distance. Therefore, the RS ECC circuit has more higher reliability of the communication in the system. The improved phase soft demodulation algorithm and soft decoding algorithm proposed in this paper enables data communication that is more reliable than other demodulation system, which also provide a significant reference for further study to the visual prosthesis system.
Wang, Jie-Sheng; Han, Shuang
2015-01-01
For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, a feed-forward neural network (FNN) based soft-sensor model optimized by the hybrid algorithm combining particle swarm optimization (PSO) algorithm and gravitational search algorithm (GSA) is proposed. Although GSA has better optimization capability, it has slow convergence velocity and is easy to fall into local optimum. So in this paper, the velocity vector and position vector of GSA are adjusted by PSO algorithm in order to improve its convergence speed and prediction accuracy. Finally, the proposed hybrid algorithm is adopted to optimize the parameters of FNN soft-sensor model. Simulation results show that the model has better generalization and prediction accuracy for the concentrate grade and tailings recovery rate to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:26583034
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
NASA Astrophysics Data System (ADS)
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
FPGA based charge acquisition algorithm for soft x-ray diagnostics system
NASA Astrophysics Data System (ADS)
Wojenski, A.; Kasprowicz, G.; Pozniak, K. T.; Zabolotny, W.; Byszuk, A.; Juszczyk, B.; Kolasinski, P.; Krawczyk, R. D.; Zienkiewicz, P.; Chernyshova, M.; Czarski, T.
2015-09-01
Soft X-ray (SXR) measurement systems working in tokamaks or with laser generated plasma can expect high photon fluxes. Therefore it is necessary to focus on data processing algorithms to have the best possible efficiency in term of processed photon events per second. This paper refers to recently designed algorithm and data-flow for implementation of charge data acquisition in FPGA. The algorithms are currently on implementation stage for the soft X-ray diagnostics system. In this paper despite of the charge processing algorithm is also described general firmware overview, data storage methods and other key components of the measurement system. The simulation section presents algorithm performance and expected maximum photon rate.
A novel complex networks clustering algorithm based on the core influence of nodes.
Tong, Chao; Niu, Jianwei; Dai, Bin; Xie, Zhongyu
2014-01-01
In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.
Liquid argon TPC signal formation, signal processing and reconstruction techniques
NASA Astrophysics Data System (ADS)
Baller, B.
2017-07-01
This document describes a reconstruction chain that was developed for the ArgoNeuT and MicroBooNE experiments at Fermilab. These experiments study accelerator neutrino interactions that occur in a Liquid Argon Time Projection Chamber. Reconstructing the properties of particles produced in these interactions benefits from the knowledge of the micro-physics processes that affect the creation and transport of ionization electrons to the readout system. A wire signal deconvolution technique was developed to convert wire signals to a standard form for hit reconstruction, to remove artifacts in the electronics chain and to remove coherent noise. A unique clustering algorithm reconstructs line-like trajectories and vertices in two dimensions which are then matched to create of 3D objects. These techniques and algorithms are available to all experiments that use the LArSoft suite of software.
Microscope self-calibration based on micro laser line imaging and soft computing algorithms
NASA Astrophysics Data System (ADS)
Apolinar Muñoz Rodríguez, J.
2018-06-01
A technique to perform microscope self-calibration via micro laser line and soft computing algorithms is presented. In this technique, the microscope vision parameters are computed by means of soft computing algorithms based on laser line projection. To implement the self-calibration, a microscope vision system is constructed by means of a CCD camera and a 38 μm laser line. From this arrangement, the microscope vision parameters are represented via Bezier approximation networks, which are accomplished through the laser line position. In this procedure, a genetic algorithm determines the microscope vision parameters by means of laser line imaging. Also, the approximation networks compute the three-dimensional vision by means of the laser line position. Additionally, the soft computing algorithms re-calibrate the vision parameters when the microscope vision system is modified during the vision task. The proposed self-calibration improves accuracy of the traditional microscope calibration, which is accomplished via external references to the microscope system. The capability of the self-calibration based on soft computing algorithms is determined by means of the calibration accuracy and the micro-scale measurement error. This contribution is corroborated by an evaluation based on the accuracy of the traditional microscope calibration.
Fast and fully automatic phalanx segmentation using a grayscale-histogram morphology algorithm
NASA Astrophysics Data System (ADS)
Hsieh, Chi-Wen; Liu, Tzu-Chiang; Jong, Tai-Lang; Chen, Chih-Yen; Tiu, Chui-Mei; Chan, Din-Yuen
2011-08-01
Bone age assessment is a common radiological examination used in pediatrics to diagnose the discrepancy between the skeletal and chronological age of a child; therefore, it is beneficial to develop a computer-based bone age assessment to help junior pediatricians estimate bone age easily. Unfortunately, the phalanx on radiograms is not easily separated from the background and soft tissue. Therefore, we proposed a new method, called the grayscale-histogram morphology algorithm, to segment the phalanges fast and precisely. The algorithm includes three parts: a tri-stage sieve algorithm used to eliminate the background of hand radiograms, a centroid-edge dual scanning algorithm to frame the phalanx region, and finally a segmentation algorithm based on disk traverse-subtraction filter to segment the phalanx. Moreover, two more segmentation methods: adaptive two-mean and adaptive two-mean clustering were performed, and their results were compared with the segmentation algorithm based on disk traverse-subtraction filter using five indices comprising misclassification error, relative foreground area error, modified Hausdorff distances, edge mismatch, and region nonuniformity. In addition, the CPU time of the three segmentation methods was discussed. The result showed that our method had a better performance than the other two methods. Furthermore, satisfactory segmentation results were obtained with a low standard error.
Employing multi-GPU power for molecular dynamics simulation: an extension of GALAMOST
NASA Astrophysics Data System (ADS)
Zhu, You-Liang; Pan, Deng; Li, Zhan-Wei; Liu, Hong; Qian, Hu-Jun; Zhao, Yang; Lu, Zhong-Yuan; Sun, Zhao-Yan
2018-04-01
We describe the algorithm of employing multi-GPU power on the basis of Message Passing Interface (MPI) domain decomposition in a molecular dynamics code, GALAMOST, which is designed for the coarse-grained simulation of soft matters. The code of multi-GPU version is developed based on our previous single-GPU version. In multi-GPU runs, one GPU takes charge of one domain and runs single-GPU code path. The communication between neighbouring domains takes a similar algorithm of CPU-based code of LAMMPS, but is optimised specifically for GPUs. We employ a memory-saving design which can enlarge maximum system size at the same device condition. An optimisation algorithm is employed to prolong the update period of neighbour list. We demonstrate good performance of multi-GPU runs on the simulation of Lennard-Jones liquid, dissipative particle dynamics liquid, polymer and nanoparticle composite, and two-patch particles on workstation. A good scaling of many nodes on cluster for two-patch particles is presented.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
ERIC Educational Resources Information Center
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Bryant, Avory C; Cios, Krzysztof J
2017-01-01
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems
2007-05-01
these systems, and after being run through an optimizing CAD tool the resulting circuit is a single entangled mess of gates and wires. To prevent the...translates MATLAB [48] algorithms into HDL, logic synthesis translates this HDL into a netlist, a synthesis tool uses a place-and-route algorithm to...Core Soft Core µ Soft P Core µP Core Hard Soft Algorithms MATLAB gcc ExecutableC Code HDL C Code Bitstream Place and Route NetlistLogic Synthesis EDK µP
Zhang, Ying; Chen, Wei; Liang, Jixing; Zheng, Bingxin; Jiang, Shengming
2015-01-01
It is expected that in the near future wireless sensor network (WSNs) will be more widely used in the mobile environment, in applications such as Autonomous Underwater Vehicles (AUVs) for marine monitoring and mobile robots for environmental investigation. The sensor nodes’ mobility can easily cause changes to the structure of a network topology, and lead to the decline in the amount of transmitted data, excessive energy consumption, and lack of security. To solve these problems, a kind of efficient Topology Control algorithm for node Mobility (TCM) is proposed. In the topology construction stage, an efficient clustering algorithm is adopted, which supports sensor node movement. It can ensure the balance of clustering, and reduce the energy consumption. In the topology maintenance stage, the digital signature authentication based on Error Correction Code (ECC) and the communication mechanism of soft handover are adopted. After verifying the legal identity of the mobile nodes, secure communications can be established, and this can increase the amount of data transmitted. Compared to some existing schemes, the proposed scheme has significant advantages regarding network topology stability, amounts of data transferred, lifetime and safety performance of the network. PMID:26633405
Zhang, Ying; Chen, Wei; Liang, Jixing; Zheng, Bingxin; Jiang, Shengming
2015-12-01
It is expected that in the near future wireless sensor network (WSNs) will be more widely used in the mobile environment, in applications such as Autonomous Underwater Vehicles (AUVs) for marine monitoring and mobile robots for environmental investigation. The sensor nodes' mobility can easily cause changes to the structure of a network topology, and lead to the decline in the amount of transmitted data, excessive energy consumption, and lack of security. To solve these problems, a kind of efficient Topology Control algorithm for node Mobility (TCM) is proposed. In the topology construction stage, an efficient clustering algorithm is adopted, which supports sensor node movement. It can ensure the balance of clustering, and reduce the energy consumption. In the topology maintenance stage, the digital signature authentication based on Error Correction Code (ECC) and the communication mechanism of soft handover are adopted. After verifying the legal identity of the mobile nodes, secure communications can be established, and this can increase the amount of data transmitted. Compared to some existing schemes, the proposed scheme has significant advantages regarding network topology stability, amounts of data transferred, lifetime and safety performance of the network.
A Scalable Framework For Segmenting Magnetic Resonance Images
Hore, Prodip; Goldgof, Dmitry B.; Gu, Yuhua; Maudsley, Andrew A.; Darkazanli, Ammar
2009-01-01
A fast, accurate and fully automatic method of segmenting magnetic resonance images of the human brain is introduced. The approach scales well allowing fast segmentations of fine resolution images. The approach is based on modifications of the soft clustering algorithm, fuzzy c-means, that enable it to scale to large data sets. Two types of modifications to create incremental versions of fuzzy c-means are discussed. They are much faster when compared to fuzzy c-means for medium to extremely large data sets because they work on successive subsets of the data. They are comparable in quality to application of fuzzy c-means to all of the data. The clustering algorithms coupled with inhomogeneity correction and smoothing are used to create a framework for automatically segmenting magnetic resonance images of the human brain. The framework is applied to a set of normal human brain volumes acquired from different magnetic resonance scanners using different head coils, acquisition parameters and field strengths. Results are compared to those from two widely used magnetic resonance image segmentation programs, Statistical Parametric Mapping and the FMRIB Software Library (FSL). The results are comparable to FSL while providing significant speed-up and better scalability to larger volumes of data. PMID:20046893
The global Minmax k-means algorithm.
Wang, Xiaoyan; Bai, Yanping
2016-01-01
The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.
Noise-enhanced clustering and competitive learning algorithms.
Osoba, Osonde; Kosko, Bart
2013-01-01
Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning. Copyright © 2012 Elsevier Ltd. All rights reserved.
Modular analysis of the probabilistic genetic interaction network.
Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting
2011-03-15
Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.
Non preemptive soft real time scheduler: High deadline meeting rate on overload
NASA Astrophysics Data System (ADS)
Khalib, Zahereel Ishwar Abdul; Ahmad, R. Badlishah; El-Shaikh, Mohamed
2015-05-01
While preemptive scheduling has gain more attention among researchers, current work in non preemptive scheduling had shown promising result in soft real time jobs scheduling. In this paper we present a non preemptive scheduling algorithm meant for soft real time applications, which is capable of producing better performance during overload while maintaining excellent performance during normal load. The approach taken by this algorithm has shown more promising results compared to other algorithms including its immediate predecessor. We will present the analysis made prior to inception of the algorithm as well as simulation results comparing our algorithm named gutEDF with EDF and gEDF. We are convinced that grouping jobs utilizing pure dynamic parameters would produce better performance.
ALP conversion and the soft X-ray excess in the outskirts of the Coma cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kraljic, David; Rummel, Markus; Conlon, Joseph P., E-mail: David.Kraljic@physics.ox.ac.uk, E-mail: Markus.Rummel@physics.ox.ac.uk, E-mail: j.conlon1@physics.ox.ac.uk
2015-01-01
It was recently found that the soft X-ray excess in the center of the Coma cluster can be fitted by conversion of axion-like-particles (ALPs) of a cosmic axion background (CAB) to photons. We extend this analysis to the outskirts of Coma, including regions up to 5 Mpc from the center of the cluster. We extract the excess soft X-ray flux from ROSAT All-Sky Survey data and compare it to the expected flux from ALP to photon conversion of a CAB. The soft X-ray excess both in the center and the outskirts of Coma can be simultaneously fitted by ALP tomore » photon conversion of a CAB. Given the uncertainties of the cluster magnetic field in the outskirts we constrain the parameter space of the CAB. In particular, an upper limit on the CAB mean energy and a range of allowed ALP-photon couplings are derived.« less
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Penza, Veronica; Ortiz, Jesús; Mattos, Leonardo S; Forgione, Antonello; De Momi, Elena
2016-02-01
Single-incision laparoscopic surgery decreases postoperative infections, but introduces limitations in the surgeon's maneuverability and in the surgical field of view. This work aims at enhancing intra-operative surgical visualization by exploiting the 3D information about the surgical site. An interactive guidance system is proposed wherein the pose of preoperative tissue models is updated online. A critical process involves the intra-operative acquisition of tissue surfaces. It can be achieved using stereoscopic imaging and 3D reconstruction techniques. This work contributes to this process by proposing new methods for improved dense 3D reconstruction of soft tissues, which allows a more accurate deformation identification and facilitates the registration process. Two methods for soft tissue 3D reconstruction are proposed: Method 1 follows the traditional approach of the block matching algorithm. Method 2 performs a nonparametric modified census transform to be more robust to illumination variation. The simple linear iterative clustering (SLIC) super-pixel algorithm is exploited for disparity refinement by filling holes in the disparity images. The methods were validated using two video datasets from the Hamlyn Centre, achieving an accuracy of 2.95 and 1.66 mm, respectively. A comparison with ground-truth data demonstrated the disparity refinement procedure: (1) increases the number of reconstructed points by up to 43 % and (2) does not affect the accuracy of the 3D reconstructions significantly. Both methods give results that compare favorably with the state-of-the-art methods. The computational time constraints their applicability in real time, but can be greatly improved by using a GPU implementation.
NASA Astrophysics Data System (ADS)
Johnson, Grant; Priest, Thomas; Laskin, Julia
2012-02-01
Monodisperse gold clusters have been prepared on surfaces in different charge states through soft landing of mass-selected ions. Gold clusters were synthesized in methanol solution by reduction of a gold precursor with a weak reducing agent in the presence of a diphosphine capping ligand. Electrospray ionization was used to introduce the clusters into the gas-phase and mass-selection was employed to isolate a single ionic cluster species which was delivered to surfaces at well controlled kinetic energies. Using in-situ time of flight secondary ion mass spectrometry (SIMS) it is demonstrated that the cluster retains its 3+ charge state when soft landed onto the surface of a fluorinated self assembled monolayer on gold. In contrast, when deposited onto carboxylic acid terminated and conventional alkyl thiol surfaces on gold the clusters exhibit larger relative abundances of the 2+ and 1+ charge states, respectively. The kinetics of charge reduction on the surface have been investigated using in-situ Fourier Transform Ion Cyclotron Resonance SIMS. It is shown that an extremely slow interfacial charge reduction occurs on the fluorinated monolayer surface while an almost instantaneous neutralization takes place on the surface of the alkyl thiol monolayer. Our results demonstrate that the size and charge state of small gold clusters on surfaces, both of which exert a dramatic influence on their chemical and physical properties, may be tuned through soft landing of mass-selected ions onto selected substrates.
Krzyżanowska, Dorota M.; Ossowicki, Adam; Rajewska, Magdalena; Maciąg, Tomasz; Jabłońska, Magdalena; Obuchowski, Michał; Heeb, Stephan; Jafra, Sylwia
2016-01-01
Dickeya solani and Pectobacterium carotovorum subsp. brasiliense are recently established species of bacterial plant pathogens causing black leg and soft rot of many vegetables and ornamental plants. Pseudomonas sp. strain P482 inhibits the growth of these pathogens, a desired trait considering the limited measures to combat these diseases. In this study, we determined the genetic background of the antibacterial activity of P482, and established the phylogenetic position of this strain. Pseudomonas sp. P482 was classified as Pseudomonas donghuensis. Genome mining revealed that the P482 genome does not contain genes determining the synthesis of known antimicrobials. However, the ClusterFinder algorithm, designed to detect atypical or novel classes of secondary metabolite gene clusters, predicted 18 such clusters in the genome. Screening of a Tn5 mutant library yielded an antimicrobial negative transposon mutant. The transposon insertion was located in a gene encoding an HpcH/HpaI aldolase/citrate lyase family protein. This gene is located in a hypothetical cluster predicted by the ClusterFinder, together with the downstream homologs of four nfs genes, that confer production of a non-fluorescent siderophore by P. donghuensis HYST. Site-directed inactivation of the HpcH/HpaI aldolase gene, the adjacent short chain dehydrogenase gene, as well as a homolog of an essential nfs cluster gene, all abolished the antimicrobial activity of the P482, suggesting their involvement in a common biosynthesis pathway. However, none of the mutants showed a decreased siderophore yield, neither was the antimicrobial activity of the wild type P482 compromised by high iron bioavailability. A genomic region comprising the nfs cluster and three upstream genes is involved in the antibacterial activity of P. donghuensis P482 against D. solani and P. carotovorum subsp. brasiliense. The genes studied are unique to the two known P. donghuensis strains. This study illustrates that mining of microbial genomes is a powerful approach for predictingthe presence of novel secondary-metabolite encoding genes especially when coupled with transposon mutagenesis. PMID:27303376
MIA-Clustering: a novel method for segmentation of paleontological material.
Dunmore, Christopher J; Wollny, Gert; Skinner, Matthew M
2018-01-01
Paleontological research increasingly uses high-resolution micro-computed tomography (μCT) to study the inner architecture of modern and fossil bone material to answer important questions regarding vertebrate evolution. This non-destructive method allows for the measurement of otherwise inaccessible morphology. Digital measurement is predicated on the accurate segmentation of modern or fossilized bone from other structures imaged in μCT scans, as errors in segmentation can result in inaccurate calculations of structural parameters. Several approaches to image segmentation have been proposed with varying degrees of automation, ranging from completely manual segmentation, to the selection of input parameters required for computational algorithms. Many of these segmentation algorithms provide speed and reproducibility at the cost of flexibility that manual segmentation provides. In particular, the segmentation of modern and fossil bone in the presence of materials such as desiccated soft tissue, soil matrix or precipitated crystalline material can be difficult. Here we present a free open-source segmentation algorithm application capable of segmenting modern and fossil bone, which also reduces subjective user decisions to a minimum. We compare the effectiveness of this algorithm with another leading method by using both to measure the parameters of a known dimension reference object, as well as to segment an example problematic fossil scan. The results demonstrate that the medical image analysis-clustering method produces accurate segmentations and offers more flexibility than those of equivalent precision. Its free availability, flexibility to deal with non-bone inclusions and limited need for user input give it broad applicability in anthropological, anatomical, and paleontological contexts.
Information Clustering Based on Fuzzy Multisets.
ERIC Educational Resources Information Center
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
An improved clustering algorithm based on reverse learning in intelligent transportation
NASA Astrophysics Data System (ADS)
Qiu, Guoqing; Kou, Qianqian; Niu, Ting
2017-05-01
With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
A roadmap of clustering algorithms: finding a match for a biomedical application.
Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael
2009-05-01
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Grant E.; Priest, Thomas A.; Laskin, Julia
2012-11-29
The ionic charge state of monodisperse cationic gold clusters on surfaces may be controlled by selecting the coverage of mass-selected ions soft landed onto a substrate. Polydisperse diphosphine-capped gold clusters were synthesized in solution by reduction of chloro(triphenylphosphine)gold(I) with borane tert-butylamine in the presence of 1,3-bis(diphenylphosphino)propane. The polydisperse gold clusters were introduced into the gas phase by electrospray ionization and mass selection was employed to select a multiply charged cationic cluster species (Au11L53+, m/z = 1409, L = 1,3-bis(diphenylphosphino)propane) which was delivered to the surfaces of four different self-assembled monolayers on gold (SAMs) at coverages of 1011 and 1012 clusters/mm2.more » Employing the spatial profiling capabilities of in-situ time-of-flight secondary ion mass spectrometry (TOF-SIMS) it is shown that, in addition to the chemical functionality of the monolayer (as demonstrated previously: ACS Nano, 2012, 6, 573) the coverage of cationic gold clusters on the surface may be used to control the distribution of ionic charge states of the soft-landed multiply charged clusters. In the case of a 1H,1H,2H,2H-perfluorodecanethiol SAM (FSAM) almost complete retention of charge by the deposited Au11L53+ clusters was observed at a lower coverage of 1011 clusters/mm2. In contrast, at a higher coverage of 1012 clusters/mm2, pronounced reduction of charge to Au11L52+ and Au11L5+ was observed on the FSAM. When soft landed onto 16- and 11-mercaptohexadecanoic acid surfaces on gold (16,11-COOH-SAMs), the mass-selected Au11L53+ clusters exhibited partial reduction of charge to Au11L52+ at lower coverage and additional reduction of charge to both Au11L52+ and Au11L5+ at higher coverage. The reduction of charge was found to be more pronounced on the surface of the shorter (thinner) C11 than the longer (thicker) C16-COOH-SAM. On the surface of the 1-dodecanethiol (HSAM) monolayer, the most abundant charge state was found to be Au11L52+ at lower coverage and Au11L5+ at higher coverage, respectively. A coverage-dependent electron tunneling mechanism is proposed to account for the observed reduction of charge of mass-selected multiply charged gold clusters soft landed on SAMs. The results demonstrate that one of the critical parameters that influence the chemical and physical properties of supported metal clusters, ionic charge state, may be controlled by selecting the coverage of charged species soft landed onto surfaces.« less
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
Measurement of kT splitting scales in W→ℓν events at [Formula: see text] with the ATLAS detector.
Aad, G; Abajyan, T; Abbott, B; Abdallah, J; Abdel Khalek, S; Abdelalim, A A; Abdinov, O; Aben, R; Abi, B; Abolins, M; AbouZeid, O S; Abramowicz, H; Abreu, H; Acharya, B S; Adamczyk, L; Adams, D L; Addy, T N; Adelman, J; Adomeit, S; Adragna, P; Adye, T; Aefsky, S; Aguilar-Saavedra, J A; Agustoni, M; Ahlen, S P; Ahles, F; Ahmad, A; Ahsan, M; Aielli, G; Åkesson, T P A; Akimoto, G; Akimov, A V; Alam, M A; Albert, J; Albrand, S; Aleksa, M; Aleksandrov, I N; Alessandria, F; Alexa, C; Alexander, G; Alexandre, G; Alexopoulos, T; Alhroob, M; Aliev, M; Alimonti, G; Alison, J; Allbrooke, B M M; Allison, L J; Allport, P P; Allwood-Spiers, S E; Almond, J; Aloisio, A; Alon, R; Alonso, A; Alonso, F; Altheimer, A; Alvarez Gonzalez, B; Alviggi, M G; Amako, K; Amelung, C; Ammosov, V V; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amram, N; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, G; Anderson, K J; Andreazza, A; Andrei, V; Anduaga, X S; Angelidakis, S; Anger, P; Angerami, A; Anghinolfi, F; Anisenkov, A; Anjos, N; Annovi, A; Antonaki, A; Antonelli, M; Antonov, A; Antos, J; Anulli, F; Aoki, M; Aperio Bella, L; Apolle, R; Arabidze, G; Aracena, I; Arai, Y; Arce, A T H; Arfaoui, S; Arguin, J-F; Argyropoulos, S; Arik, E; Arik, M; Armbruster, A J; Arnaez, O; Arnal, V; Artamonov, A; Artoni, G; Arutinov, D; Asai, S; Ask, S; Åsman, B; Asquith, L; Assamagan, K; Astalos, R; Astbury, A; Atkinson, M; Auerbach, B; Auge, E; Augsten, K; Aurousseau, M; Avolio, G; Axen, D; Azuelos, G; Azuma, Y; Baak, M A; Baccaglioni, G; Bacci, C; Bach, A M; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Backus Mayes, J; Badescu, E; Bagnaia, P; Bai, Y; Bailey, D C; Bain, T; Baines, J T; Baker, O K; Baker, S; Balek, P; Balli, F; Banas, E; Banerjee, P; Banerjee, Sw; Banfi, D; Bangert, A; Bansal, V; Bansil, H S; Barak, L; Baranov, S P; Barber, T; Barberio, E L; Barberis, D; Barbero, M; Bardin, D Y; Barillari, T; Barisonzi, M; Barklow, T; Barlow, N; Barnett, B M; Barnett, R M; Baroncelli, A; Barone, G; Barr, A J; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartsch, V; Basye, A; Bates, R L; Batkova, L; Batley, J R; Battaglia, A; Battistin, M; Bauer, F; Bawa, H S; Beale, S; Beau, T; Beauchemin, P H; Beccherle, R; Bechtle, P; Beck, H P; Becker, K; Becker, S; Beckingham, M; Becks, K H; Beddall, A J; Beddall, A; Bedikian, S; Bednyakov, V A; Bee, C P; Beemster, L J; Beermann, T A; Begel, M; Behar Harpaz, S; Belanger-Champagne, C; Bell, P J; Bell, W H; Bella, G; Bellagamba, L; Bellomo, M; Belloni, A; Beloborodova, O; Belotskiy, K; Beltramello, O; Benary, O; Benchekroun, D; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez Garcia, J A; Benjamin, D P; Benoit, M; Bensinger, J R; Benslama, K; Bentvelsen, S; Berge, D; Bergeaas Kuutmann, E; Berger, N; Berghaus, F; Berglund, E; Beringer, J; Bernat, P; Bernhard, R; Bernius, C; Bernlochner, F U; Berry, T; Bertella, C; Bertin, A; Bertolucci, F; Besana, M I; Besjes, G J; Besson, N; Bethke, S; Bhimji, W; Bianchi, R M; Bianchini, L; Bianco, M; Biebel, O; Bieniek, S P; Bierwagen, K; Biesiada, J; Biglietti, M; Bilokon, H; Bindi, M; Binet, S; Bingul, A; Bini, C; Biscarat, C; Bittner, B; Black, C W; Black, J E; Black, K M; Blair, R E; Blanchard, J-B; Blazek, T; Bloch, I; Blocker, C; Blocki, J; Blum, W; Blumenschein, U; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Boddy, C R; Boehler, M; Boek, J; Boek, T T; Boelaert, N; Bogaerts, J A; Bogdanchikov, A; Bogouch, A; Bohm, C; Bohm, J; Boisvert, V; Bold, T; Boldea, V; Bolnet, N M; Bomben, M; Bona, M; Boonekamp, M; Bordoni, S; Borer, C; Borisov, A; Borissov, G; Borjanovic, I; Borri, M; Borroni, S; Bortfeldt, J; Bortolotto, V; Bos, K; Boscherini, D; Bosman, M; Boterenbrood, H; Bouchami, J; Boudreau, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Bousson, N; Boutouil, S; Boveia, A; Boyd, J; Boyko, I R; Bozovic-Jelisavcic, I; Bracinik, J; Branchini, P; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Braun, H M; Brazzale, S F; Brelier, B; Bremer, J; Brendlinger, K; Brenner, R; Bressler, S; Bristow, T M; Britton, D; Brochu, F M; Brock, I; Brock, R; Broggi, F; Bromberg, C; Bronner, J; Brooijmans, G; Brooks, T; Brooks, W K; Brown, G; Bruckman de Renstrom, P A; Bruncko, D; Bruneliere, R; Brunet, S; Bruni, A; Bruni, G; Bruschi, M; Bryngemark, L; Buanes, T; Buat, Q; Bucci, F; Buchanan, J; Buchholz, P; Buckingham, R M; Buckley, A G; Buda, S I; Budagov, I A; Budick, B; Bugge, L; Bulekov, O; Bundock, A C; Bunse, M; Buran, T; Burckhart, H; Burdin, S; Burgess, T; Burke, S; Busato, E; Büscher, V; Bussey, P; Buszello, C P; Butler, B; Butler, J M; Buttar, C M; Butterworth, J M; Buttinger, W; Byszewski, M; Cabrera Urbán, S; Caforio, D; Cakir, O; Calafiura, P; Calderini, G; Calfayan, P; Calkins, R; Caloba, L P; Caloi, R; Calvet, D; Calvet, S; Camacho Toro, R; Camarri, P; Cameron, D; Caminada, L M; Caminal Armadans, R; Campana, S; Campanelli, M; Canale, V; Canelli, F; Canepa, A; Cantero, J; Cantrill, R; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capriotti, D; Capua, M; Caputo, R; Cardarelli, R; Carli, T; Carlino, G; Carminati, L; Caron, S; Carquin, E; Carrillo-Montoya, G D; Carter, A A; Carter, J R; Carvalho, J; Casadei, D; Casado, M P; Cascella, M; Caso, C; Castaneda-Miranda, E; Castillo Gimenez, V; Castro, N F; Cataldi, G; Catastini, P; Catinaccio, A; Catmore, J R; Cattai, A; Cattani, G; Caughron, S; Cavaliere, V; Cavalleri, P; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Ceradini, F; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cetin, S A; Chafaq, A; Chakraborty, D; Chalupkova, I; Chan, K; Chang, P; Chapleau, B; Chapman, J D; Chapman, J W; Charlton, D G; Chavda, V; Chavez Barajas, C A; Cheatham, S; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, S; Chen, X; Chen, Y; Cheng, Y; Cheplakov, A; Cherkaoui El Moursli, R; Chernyatin, V; Cheu, E; Cheung, S L; Chevalier, L; Chiefari, G; Chikovani, L; Childers, J T; Chilingarov, A; Chiodini, G; Chisholm, A S; Chislett, R T; Chitan, A; Chizhov, M V; Choudalakis, G; Chouridou, S; Chow, B K B; Christidi, I A; Christov, A; Chromek-Burckhart, D; Chu, M L; Chudoba, J; Ciapetti, G; Ciftci, A K; Ciftci, R; Cinca, D; Cindro, V; Ciocio, A; Cirilli, M; Cirkovic, P; Citron, Z H; Citterio, M; Ciubancan, M; Clark, A; Clark, P J; Clarke, R N; Cleland, W; Clemens, J C; Clement, B; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Coffey, L; Cogan, J G; Coggeshall, J; Colas, J; Cole, S; Colijn, A P; Collins, N J; Collins-Tooth, C; Collot, J; Colombo, T; Colon, G; Compostella, G; Conde Muiño, P; Coniavitis, E; Conidi, M C; Consonni, S M; Consorti, V; Constantinescu, S; Conta, C; Conti, G; Conventi, F; Cooke, M; Cooper, B D; Cooper-Sarkar, A M; Cooper-Smith, N J; Copic, K; Cornelissen, T; Corradi, M; Corriveau, F; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Côté, D; Cottin, G; Courneyea, L; Cowan, G; Cox, B E; Cranmer, K; Crépé-Renaudin, S; Crescioli, F; Cristinziani, M; Crosetti, G; Cuciuc, C-M; Cuenca Almenar, C; Cuhadar Donszelmann, T; Cummings, J; Curatolo, M; Curtis, C J; Cuthbert, C; Cwetanski, P; Czirr, H; Czodrowski, P; Czyczula, Z; D'Auria, S; D'Onofrio, M; D'Orazio, A; Da Cunha Sargedas De Sousa, M J; Da Via, C; Dabrowski, W; Dafinca, A; Dai, T; Dallaire, F; Dallapiccola, C; Dam, M; Damiani, D S; Danielsson, H O; Dao, V; Darbo, G; Darlea, G L; Darmora, S; Dassoulas, J A; Davey, W; Davidek, T; Davidson, N; Davidson, R; Davies, E; Davies, M; Davignon, O; Davison, A R; Davygora, Y; Dawe, E; Dawson, I; Daya-Ishmukhametova, R K; De, K; de Asmundis, R; De Castro, S; De Cecco, S; de Graat, J; De Groot, N; de Jong, P; De La Taille, C; De la Torre, H; De Lorenzi, F; De Nooij, L; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Vivie De Regie, J B; De Zorzi, G; Dearnaley, W J; Debbe, R; Debenedetti, C; Dechenaux, B; Dedovich, D V; Degenhardt, J; Del Peso, J; Del Prete, T; Delemontex, T; Deliyergiyev, M; Dell'Acqua, A; Dell'Asta, L; Della Pietra, M; Della Volpe, D; Delmastro, M; Delsart, P A; Deluca, C; Demers, S; Demichev, M; Demirkoz, B; Denisov, S P; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deviveiros, P O; Dewhurst, A; DeWilde, B; Dhaliwal, S; Dhullipudi, R; Di Ciaccio, A; Di Ciaccio, L; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Luise, S; Di Mattia, A; Di Micco, B; Di Nardo, R; Di Simone, A; Di Sipio, R; Diaz, M A; Diehl, E B; Dietrich, J; Dietzsch, T A; Diglio, S; Dindar Yagci, K; Dingfelder, J; Dinut, F; Dionisi, C; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; do Vale, M A B; Do Valle Wemans, A; Doan, T K O; Dobbs, M; Dobos, D; Dobson, E; Dodd, J; Doglioni, C; Doherty, T; Dohmae, T; Doi, Y; Dolejsi, J; Dolezal, Z; Dolgoshein, B A; Donadelli, M; Donini, J; Dopke, J; Doria, A; Dos Anjos, A; Dotti, A; Dova, M T; Doyle, A T; Dressnandt, N; Dris, M; Dubbert, J; Dube, S; Dubreuil, E; Duchovni, E; Duckeck, G; Duda, D; Dudarev, A; Dudziak, F; Duerdoth, I P; Duflot, L; Dufour, M-A; Duguid, L; Dührssen, M; Dunford, M; Duran Yildiz, H; Düren, M; Duxfield, R; Dwuznik, M; Ebenstein, W L; Ebke, J; Eckweiler, S; Edson, W; Edwards, C A; Edwards, N C; Ehrenfeld, W; Eifert, T; Eigen, G; Einsweiler, K; Eisenhandler, E; Ekelof, T; El Kacimi, M; Ellert, M; Elles, S; Ellinghaus, F; Ellis, K; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Engelmann, R; Engl, A; Epp, B; Erdmann, J; Ereditato, A; Eriksson, D; Ernst, J; Ernst, M; Ernwein, J; Errede, D; Errede, S; Ertel, E; Escalier, M; Esch, H; Escobar, C; Espinal Curull, X; Esposito, B; Etienne, F; Etienvre, A I; Etzion, E; Evangelakou, D; Evans, H; Fabbri, L; Fabre, C; Facini, G; Fakhrutdinov, R M; Falciano, S; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farley, J; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Fatholahzadeh, B; Favareto, A; Fayard, L; Federic, P; Fedin, O L; Fedorko, W; Fehling-Kaschek, M; Feligioni, L; Feng, C; Feng, E J; Fenyuk, A B; Ferencei, J; Fernando, W; Ferrag, S; Ferrando, J; Ferrara, V; Ferrari, A; Ferrari, P; Ferrari, R; Ferreira de Lima, D E; Ferrer, A; Ferrere, D; Ferretti, C; Ferretto Parodi, A; Fiascaris, M; Fiedler, F; Filipčič, A; Filthaut, F; Fincke-Keeler, M; Fiolhais, M C N; Fiorini, L; Firan, A; Fischer, J; Fisher, M J; Fitzgerald, E A; Flechl, M; Fleck, I; Fleischmann, P; Fleischmann, S; Fletcher, G T; Fletcher, G; Flick, T; Floderus, A; Flores Castillo, L R; Florez Bustos, A C; Flowerdew, M J; Fonseca Martin, T; Formica, A; Forti, A; Fortin, D; Fournier, D; Fowler, A J; Fox, H; Francavilla, P; Franchini, M; Franchino, S; Francis, D; Frank, T; Franklin, M; Franz, S; Fraternali, M; Fratina, S; French, S T; Friedrich, C; Friedrich, F; Froidevaux, D; Frost, J A; Fukunaga, C; Fullana Torregrosa, E; Fulsom, B G; Fuster, J; Gabaldon, C; Gabizon, O; Gadatsch, S; Gadfort, T; Gadomski, S; Gagliardi, G; Gagnon, P; Galea, C; Galhardo, B; Gallas, E J; Gallo, V; Gallop, B J; Gallus, P; Gan, K K; Gandrajula, R P; Gao, Y S; Gaponenko, A; Garay Walls, F M; Garberson, F; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gatti, C; Gaudio, G; Gaur, B; Gauthier, L; Gauzzi, P; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Ge, P; Gecse, Z; Gee, C N P; Geerts, D A A; Geich-Gimbel, Ch; Gellerstedt, K; Gemme, C; Gemmell, A; Genest, M H; Gentile, S; George, M; George, S; Gerbaudo, D; Gerlach, P; Gershon, A; Geweniger, C; Ghazlane, H; Ghodbane, N; Giacobbe, B; Giagu, S; Giangiobbe, V; Gianotti, F; Gibbard, B; Gibson, A; Gibson, S M; Gilchriese, M; Gillam, T P S; Gillberg, D; Gillman, A R; Gingrich, D M; Ginzburg, J; Giokaris, N; Giordani, M P; Giordano, R; Giorgi, F M; Giovannini, P; Giraud, P F; Giugni, D; Giunta, M; Gjelsten, B K; Gladilin, L K; Glasman, C; Glatzer, J; Glazov, A; Glonti, G L; Goddard, J R; Godfrey, J; Godlewski, J; Goebel, M; Goeringer, C; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gomez Fajardo, L S; Gonçalo, R; Goncalves Pinto Firmino Da Costa, J; Gonella, L; González de la Hoz, S; Gonzalez Parra, G; Gonzalez Silva, M L; Gonzalez-Sevilla, S; Goodson, J J; Goossens, L; Göpfert, T; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorfine, G; Gorini, B; Gorini, E; Gorišek, A; Gornicki, E; Goshaw, A T; Gössling, C; Gostkin, M I; Gough Eschrich, I; Gouighri, M; Goujdami, D; Goulette, M P; Goussiou, A G; Goy, C; Gozpinar, S; Graber, L; Grabowska-Bold, I; Grafström, P; Grahn, K-J; Gramstad, E; Grancagnolo, F; Grancagnolo, S; Grassi, V; Gratchev, V; Gray, H M; Gray, J A; Graziani, E; Grebenyuk, O G; Greenshaw, T; Greenwood, Z D; Gregersen, K; Gregor, I M; Grenier, P; Griffiths, J; Grigalashvili, N; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grishkevich, Y V; Grivaz, J-F; Grohs, J P; Grohsjean, A; Gross, E; Grosse-Knetter, J; Groth-Jensen, J; Grybel, K; Guest, D; Gueta, O; Guicheney, C; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gunther, J; Guo, B; Guo, J; Gutierrez, P; Guttman, N; Gutzwiller, O; Guyot, C; Gwenlan, C; Gwilliam, C B; Haas, A; Haas, S; Haber, C; Hadavand, H K; Hadley, D R; Haefner, P; Hajduk, Z; Hakobyan, H; Hall, D; Halladjian, G; Hamacher, K; Hamal, P; Hamano, K; Hamer, M; Hamilton, A; Hamilton, S; Han, L; Hanagaki, K; Hanawa, K; Hance, M; Handel, C; Hanke, P; Hansen, J R; Hansen, J B; Hansen, J D; Hansen, P H; Hansson, P; Hara, K; Harenberg, T; Harkusha, S; Harper, D; Harrington, R D; Harris, O M; Hartert, J; Hartjes, F; Haruyama, T; Harvey, A; Hasegawa, S; Hasegawa, Y; Hassani, S; Haug, S; Hauschild, M; Hauser, R; Havranek, M; Hawkes, C M; Hawkings, R J; Hawkins, A D; Hayakawa, T; Hayashi, T; Hayden, D; Hays, C P; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heim, S; Heinemann, B; Heisterkamp, S; Helary, L; Heller, C; Heller, M; Hellman, S; Hellmich, D; Helsens, C; Henderson, R C W; Henke, M; Henrichs, A; Henriques Correia, A M; Henrot-Versille, S; Hensel, C; Hernandez, C M; Hernández Jiménez, Y; Herrberg, R; Herten, G; Hertenberger, R; Hervas, L; Hesketh, G G; Hessey, N P; Hickling, R; Higón-Rodriguez, E; Hill, J C; Hiller, K H; Hillert, S; Hillier, S J; Hinchliffe, I; Hines, E; Hirose, M; Hirsch, F; Hirschbuehl, D; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoffman, J; Hoffmann, D; Hohlfeld, M; Holmgren, S O; Holy, T; Holzbauer, J L; Hong, T M; Hooft van Huysduynen, L; Hostachy, J-Y; Hou, S; Hoummada, A; Howard, J; Howarth, J; Hrabovsky, M; Hristova, I; Hrivnac, J; Hryn'ova, T; Hsu, P J; Hsu, S-C; Hu, D; Hubacek, Z; Hubaut, F; Huegging, F; Huettmann, A; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Hülsing, T A; Hurwitz, M; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibbotson, M; Ibragimov, I; Iconomidou-Fayard, L; Idarraga, J; Iengo, P; Igonkina, O; Ikegami, Y; Ikematsu, K; Ikeno, M; Iliadis, D; Ilic, N; Ince, T; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Irles Quiles, A; Isaksson, C; Ishino, M; Ishitsuka, M; Ishmukhametov, R; Issever, C; Istin, S; Ivashin, A V; Iwanski, W; Iwasaki, H; Izen, J M; Izzo, V; Jackson, B; Jackson, J N; Jackson, P; Jaekel, M R; Jain, V; Jakobs, K; Jakobsen, S; Jakoubek, T; Jakubek, J; Jamin, D O; Jana, D K; Jansen, E; Jansen, H; Janssen, J; Jantsch, A; Janus, M; Jared, R C; Jarlskog, G; Jeanty, L; Jeng, G-Y; Jen-La Plante, I; Jennens, D; Jenni, P; Jeske, C; Jež, P; Jézéquel, S; Jha, M K; Ji, H; Ji, W; Jia, J; Jiang, Y; Jimenez Belenguer, M; Jin, S; Jinnouchi, O; Joergensen, M D; Joffe, D; Johansen, M; Johansson, K E; Johansson, P; Johnert, S; Johns, K A; Jon-And, K; Jones, G; Jones, R W L; Jones, T J; Joram, C; Jorge, P M; Joshi, K D; Jovicevic, J; Jovin, T; Ju, X; Jung, C A; Jungst, R M; Juranek, V; Jussel, P; Juste Rozas, A; Kabana, S; Kaci, M; Kaczmarska, A; Kadlecik, P; Kado, M; Kagan, H; Kagan, M; Kajomovitz, E; Kalinin, S; Kama, S; Kanaya, N; Kaneda, M; Kaneti, S; Kanno, T; Kantserov, V A; Kanzaki, J; Kaplan, B; Kapliy, A; Kar, D; Karagounis, M; Karakostas, K; Karnevskiy, M; Kartvelishvili, V; Karyukhin, A N; Kashif, L; Kasieczka, G; Kass, R D; Kastanas, A; Kataoka, Y; Katzy, J; Kaushik, V; Kawagoe, K; Kawamoto, T; Kawamura, G; Kazama, S; Kazanin, V F; Kazarinov, M Y; Keeler, R; Keener, P T; Kehoe, R; Keil, M; Keller, J S; Kenyon, M; Keoshkerian, H; Kepka, O; Kerschen, N; Kerševan, B P; Kersten, S; Kessoku, K; Keung, J; Khalil-Zada, F; Khandanyan, H; Khanov, A; Kharchenko, D; Khodinov, A; Khomich, A; Khoo, T J; Khoriauli, G; Khoroshilov, A; Khovanskiy, V; Khramov, E; Khubua, J; Kim, H; Kim, S H; Kimura, N; Kind, O; King, B T; King, M; King, R S B; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kitamura, T; Kittelmann, T; Kiuchi, K; Kladiva, E; Klein, M; Klein, U; Kleinknecht, K; Klemetti, M; Klier, A; Klimek, P; Klimentov, A; Klingenberg, R; Klinger, J A; Klinkby, E B; Klioutchnikova, T; Klok, P F; Klous, S; Kluge, E-E; Kluge, T; Kluit, P; Kluth, S; Kneringer, E; Knoops, E B F G; Knue, A; Ko, B R; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koenig, S; Koetsveld, F; Koevesarki, P; Koffas, T; Koffeman, E; Kogan, L A; Kohlmann, S; Kohn, F; Kohout, Z; Kohriki, T; Koi, T; Kolanoski, H; Kolesnikov, V; Koletsou, I; Koll, J; Komar, A A; Komori, Y; Kondo, T; Köneke, K; König, A C; Kono, T; Kononov, A I; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Köpke, L; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A; Korolkov, I; Korolkova, E V; Korotkov, V A; Kortner, O; Kortner, S; Kostyukhin, V V; Kotov, S; Kotov, V M; Kotwal, A; Kourkoumelis, C; Kouskoura, V; Koutsman, A; Kowalewski, R; Kowalski, T Z; Kozanecki, W; Kozhin, A S; Kral, V; Kramarenko, V A; Kramberger, G; Krasny, M W; Krasznahorkay, A; Kraus, J K; Kravchenko, A; Kreiss, S; Krejci, F; Kretzschmar, J; Kreutzfeldt, K; Krieger, N; Krieger, P; Kroeninger, K; Kroha, H; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Kruker, T; Krumnack, N; Krumshteyn, Z V; Kruse, M K; Kubota, T; Kuday, S; Kuehn, S; Kugel, A; Kuhl, T; Kukhtin, V; Kulchitsky, Y; Kuleshov, S; Kuna, M; Kunkle, J; Kupco, A; Kurashige, H; Kurata, M; Kurochkin, Y A; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwee, R; La Rosa, A; La Rotonda, L; Labarga, L; Lablak, S; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Lacuesta, V R; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Laisne, E; Lambourne, L; Lampen, C L; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lang, V S; Lange, C; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Laplace, S; Lapoire, C; Laporte, J F; Lari, T; Larner, A; Lassnig, M; Laurelli, P; Lavorini, V; Lavrijsen, W; Laycock, P; Le Dortz, O; Le Guirriec, E; Le Menedeu, E; LeCompte, T; Ledroit-Guillon, F; Lee, H; Lee, J S H; Lee, S C; Lee, L; Lefebvre, M; Legendre, M; Legger, F; Leggett, C; Lehmacher, M; Lehmann Miotto, G; Leister, A G; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Lendermann, V; Leney, K J C; Lenz, T; Lenzen, G; Lenzi, B; Leonhardt, K; Leontsinis, S; Lepold, F; Leroy, C; Lessard, J-R; Lester, C G; Lester, C M; Levêque, J; Levin, D; Levinson, L J; Lewis, A; Lewis, G H; Leyko, A M; Leyton, M; Li, B; Li, B; Li, H; Li, H L; Li, S; Li, X; Liang, Z; Liao, H; Liberti, B; Lichard, P; Lie, K; Liebal, J; Liebig, W; Limbach, C; Limosani, A; Limper, M; Lin, S C; Linde, F; Linnemann, J T; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lissauer, D; Lister, A; Litke, A M; Liu, D; Liu, J B; Liu, L; Liu, M; Liu, Y; Livan, M; Livermore, S S A; Lleres, A; Llorente Merino, J; Lloyd, S L; Lo Sterzo, F; Lobodzinska, E; Loch, P; Lockman, W S; Loddenkoetter, T; Loebinger, F K; Loevschall-Jensen, A E; Loginov, A; Loh, C W; Lohse, T; Lohwasser, K; Lokajicek, M; Lombardo, V P; Long, R E; Lopes, L; Lopez Mateos, D; Lorenz, J; Lorenzo Martinez, N; Losada, M; Loscutoff, P; Losty, M J; Lou, X; Lounis, A; Loureiro, K F; Love, J; Love, P A; Lowe, A J; Lu, F; Lubatti, H J; Luci, C; Lucotte, A; Ludwig, D; Ludwig, I; Ludwig, J; Luehring, F; Lukas, W; Luminari, L; Lund, E; Lundberg, B; Lundberg, J; Lundberg, O; Lund-Jensen, B; Lundquist, J; Lungwitz, M; Lynn, D; Lysak, R; Lytken, E; Ma, H; Ma, L L; Maccarrone, G; Macchiolo, A; Maček, B; Machado Miguens, J; Macina, D; Mackeprang, R; Madar, R; Madaras, R J; Maddocks, H J; Mader, W F; Madsen, A; Maeno, M; Maeno, T; Magnoni, L; Magradze, E; Mahboubi, K; Mahlstedt, J; Mahmoud, S; Mahout, G; Maiani, C; Maidantchik, C; Maio, A; Majewski, S; Makida, Y; Makovec, N; Mal, P; Malaescu, B; Malecki, Pa; Malecki, P; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyshev, V; Malyukov, S; Mamuzic, J; Manabe, A; Mandelli, L; Mandić, I; Mandrysch, R; Maneira, J; Manfredini, A; Manhaes de Andrade Filho, L; Manjarres Ramos, J A; Mann, A; Manning, P M; Manousakis-Katsikakis, A; Mansoulie, B; Mantifel, R; Mapelli, A; Mapelli, L; March, L; Marchand, J F; Marchese, F; Marchiori, G; Marcisovsky, M; Marino, C P; Marroquim, F; Marshall, Z; Marti, L F; Marti-Garcia, S; Martin, B; Martin, B; Martin, J P; Martin, T A; Martin, V J; Martin Dit Latour, B; Martinez, H; Martinez, M; Martinez Outschoorn, V; Martin-Haugh, S; Martyniuk, A C; Marx, M; Marzano, F; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, I; Massol, N; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Matsunaga, H; Matsushita, T; Mättig, P; Mättig, S; Mattravers, C; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Mazur, M; Mazzaferro, L; Mazzanti, M; Mc Donald, J; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McCubbin, N A; McFarlane, K W; Mcfayden, J A; Mchedlidze, G; Mclaughlan, T; McMahon, S J; McPherson, R A; Meade, A; Mechnich, J; Mechtel, M; Medinnis, M; Meehan, S; Meera-Lebbai, R; Meguro, T; Mehlhase, S; Mehta, A; Meier, K; Meineck, C; Meirose, B; Melachrinos, C; Mellado Garcia, B R; Meloni, F; Mendoza Navas, L; Meng, Z; Mengarelli, A; Menke, S; Meoni, E; Mercurio, K M; Meric, N; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Merritt, H; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, C; Meyer, J-P; Meyer, J; Meyer, J; Michal, S; Micu, L; Middleton, R P; Migas, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Miller, D W; Miller, R J; Mills, W J; Mills, C; Milov, A; Milstead, D A; Milstein, D; Minaenko, A A; Miñano Moya, M; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Ming, Y; Mir, L M; Mirabelli, G; Mitrevski, J; Mitsou, V A; Mitsui, S; Miyagawa, P S; Mjörnmark, J U; Moa, T; Moeller, V; Mohapatra, S; Mohr, W; Moles-Valls, R; Molfetas, A; Mönig, K; Monini, C; Monk, J; Monnier, E; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Mora Herrera, C; Moraes, A; Morange, N; Morel, J; Moreno, D; Moreno Llácer, M; Morettini, P; Morgenstern, M; Morii, M; Morley, A K; Mornacchi, G; Morris, J D; Morvaj, L; Möser, N; Moser, H G; Mosidze, M; Moss, J; Mount, R; Mountricha, E; Mouraviev, S V; Moyse, E J W; Mueller, F; Mueller, J; Mueller, K; Mueller, T; Muenstermann, D; Müller, T A; Munwes, Y; Murray, W J; Mussche, I; Musto, E; Myagkov, A G; Myska, M; Nackenhorst, O; Nadal, J; Nagai, K; Nagai, R; Nagai, Y; Nagano, K; Nagarkar, A; Nagasaka, Y; Nagel, M; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Namasivayam, H; Nanava, G; Napier, A; Narayan, R; Nash, M; Nattermann, T; Naumann, T; Navarro, G; Neal, H A; Nechaeva, P Yu; Neep, T J; Negri, A; Negri, G; Negrini, M; Nektarijevic, S; Nelson, A; Nelson, T K; Nemecek, S; Nemethy, P; Nepomuceno, A A; Nessi, M; Neubauer, M S; Neumann, M; Neusiedl, A; Neves, R M; Nevski, P; Newcomer, F M; Newman, P R; Nguyen, D H; Nguyen Thi Hong, V; Nickerson, R B; Nicolaidou, R; Nicquevert, B; Niedercorn, F; Nielsen, J; Nikiforou, N; Nikiforov, A; Nikolaenko, V; Nikolic-Audit, I; Nikolics, K; Nikolopoulos, K; Nilsen, H; Nilsson, P; Ninomiya, Y; Nisati, A; Nisius, R; Nobe, T; Nodulman, L; Nomachi, M; Nomidis, I; Norberg, S; Nordberg, M; Novakova, J; Nozaki, M; Nozka, L; Nuncio-Quiroz, A-E; Nunes Hanninger, G; Nunnemann, T; Nurse, E; O'Brien, B J; O'Neil, D C; O'Shea, V; Oakes, L B; Oakham, F G; Oberlack, H; Ocariz, J; Ochi, A; Ochoa, M I; Oda, S; Odaka, S; Odier, J; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohshima, T; Okamura, W; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Olchevski, A G; Olivares Pino, S A; Oliveira, M; Oliveira Damazio, D; Oliver Garcia, E; Olivito, D; Olszewski, A; Olszowska, J; Onofre, A; Onyisi, P U E; Oram, C J; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Oropeza Barrera, C; Orr, R S; Osculati, B; Ospanov, R; Osuna, C; Otero Y Garzon, G; Ottersbach, J P; Ouchrif, M; Ouellette, E A; Ould-Saada, F; Ouraou, A; Ouyang, Q; Ovcharova, A; Owen, M; Owen, S; Ozcan, V E; Ozturk, N; Pacheco Pages, A; Padilla Aranda, C; Pagan Griso, S; Paganis, E; Pahl, C; Paige, F; Pais, P; Pajchel, K; Palacino, G; Paleari, C P; Palestini, S; Pallin, D; Palma, A; Palmer, J D; Pan, Y B; Panagiotopoulou, E; Panduro Vazquez, J G; Pani, P; Panikashvili, N; Panitkin, S; Pantea, D; Papadelis, A; Papadopoulou, Th D; Paramonov, A; Paredes Hernandez, D; Park, W; Parker, M A; Parodi, F; Parsons, J A; Parzefall, U; Pashapour, S; Pasqualucci, E; Passaggio, S; Passeri, A; Pastore, F; Pastore, Fr; Pásztor, G; Pataraia, S; Patel, N D; Pater, J R; Patricelli, S; Pauly, T; Pearce, J; Pedersen, M; Pedraza Lopez, S; Pedraza Morales, M I; Peleganchuk, S V; Pelikan, D; Peng, H; Penning, B; Penson, A; Penwell, J; Perez Cavalcanti, T; Perez Codina, E; Pérez García-Estañ, M T; Perez Reale, V; Perini, L; Pernegger, H; Perrino, R; Perrodo, P; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, J; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petrolo, E; Petrucci, F; Petschull, D; Petteni, M; Pezoa, R; Phan, A; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Piccinini, M; Piec, S M; Piegaia, R; Pignotti, D T; Pilcher, J E; Pilkington, A D; Pina, J; Pinamonti, M; Pinder, A; Pinfold, J L; Pingel, A; Pinto, B; Pizio, C; Pleier, M-A; Pleskot, V; Plotnikova, E; Plucinski, P; Poblaguev, A; Poddar, S; Podlyski, F; Poettgen, R; Poggioli, L; Pohl, D; Pohl, M; Polesello, G; Policicchio, A; Polifka, R; Polini, A; Poll, J; Polychronakos, V; Pomeroy, D; Pommès, K; Pontecorvo, L; Pope, B G; Popeneciu, G A; Popovic, D S; Poppleton, A; Portell Bueso, X; Pospelov, G E; Pospisil, S; Potrap, I N; Potter, C J; Potter, C T; Poulard, G; Poveda, J; Pozdnyakov, V; Prabhu, R; Pralavorio, P; Pranko, A; Prasad, S; Pravahan, R; Prell, S; Pretzl, K; Price, D; Price, J; Price, L E; Prieur, D; Primavera, M; Proissl, M; Prokofiev, K; Prokoshin, F; Protopapadaki, E; Protopopescu, S; Proudfoot, J; Prudent, X; Przybycien, M; Przysiezniak, H; Psoroulas, S; Ptacek, E; Pueschel, E; Puldon, D; Purohit, M; Puzo, P; Pylypchenko, Y; Qian, J; Quadt, A; Quarrie, D R; Quayle, W B; Quilty, D; Raas, M; Radeka, V; Radescu, V; Radloff, P; Ragusa, F; Rahal, G; Rahimi, A M; Rajagopalan, S; Rammensee, M; Rammes, M; Randle-Conde, A S; Randrianarivony, K; Rangel-Smith, C; Rao, K; Rauscher, F; Rave, T C; Ravenscroft, T; Raymond, M; Read, A L; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reeves, K; Reinsch, A; Reisinger, I; Relich, M; Rembser, C; Ren, Z L; Renaud, A; Rescigno, M; Resconi, S; Resende, B; Reznicek, P; Rezvani, R; Richter, R; Richter-Was, E; Ridel, M; Rieck, P; Rijssenbeek, M; Rimoldi, A; Rinaldi, L; Rios, R R; Ritsch, E; Riu, I; Rivoltella, G; Rizatdinova, F; Rizvi, E; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Rocha de Lima, J G; Roda, C; Roda Dos Santos, D; Roe, A; Roe, S; Røhne, O; Rolli, S; Romaniouk, A; Romano, M; Romeo, G; Romero Adam, E; Rompotis, N; Roos, L; Ros, E; Rosati, S; Rosbach, K; Rose, A; Rose, M; Rosenbaum, G A; Rosendahl, P L; Rosenthal, O; Rosselet, L; Rossetti, V; Rossi, E; Rossi, L P; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Royon, C R; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rubinskiy, I; Ruckstuhl, N; Rud, V I; Rudolph, C; Rudolph, M S; Rühr, F; Ruiz-Martinez, A; Rumyantsev, L; Rurikova, Z; Rusakovich, N A; Ruschke, A; Rutherfoord, J P; Ruthmann, N; Ruzicka, P; Ryabov, Y F; Rybar, M; Rybkin, G; Ryder, N C; Saavedra, A F; Sadeh, I; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Sakamoto, H; Salamanna, G; Salamon, A; Saleem, M; Salek, D; Salihagic, D; Salnikov, A; Salt, J; Salvachua Ferrando, B M; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sampsonidis, D; Sanchez, A; Sánchez, J; Sanchez Martinez, V; Sandaker, H; Sander, H G; Sanders, M P; Sandhoff, M; Sandoval, T; Sandoval, C; Sandstroem, R; Sankey, D P C; Sansoni, A; Santamarina Rios, C; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapp, K; Saraiva, J G; Sarangi, T; Sarkisyan-Grinbaum, E; Sarrazin, B; Sarri, F; Sartisohn, G; Sasaki, O; Sasaki, Y; Sasao, N; Satsounkevitch, I; Sauvage, G; Sauvan, E; Sauvan, J B; Savard, P; Savinov, V; Savu, D O; Sawyer, L; Saxon, D H; Saxon, J; Sbarra, C; Sbrizzi, A; Scannicchio, D A; Scarcella, M; Schaarschmidt, J; Schacht, P; Schaefer, D; Schaelicke, A; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Scherzer, M I; Schiavi, C; Schieck, J; Schillo, C; Schioppa, M; Schlenker, S; Schmidt, E; Schmieden, K; Schmitt, C; Schmitt, C; Schmitt, S; Schneider, B; Schnellbach, Y J; Schnoor, U; Schoeffel, L; Schoening, A; Schorlemmer, A L S; Schott, M; Schouten, D; Schovancova, J; Schram, M; Schroeder, C; Schroer, N; Schultens, M J; Schultes, J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwartzman, A; Schwegler, Ph; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Schwindt, T; Schwoerer, M; Sciacca, F G; Scifo, E; Sciolla, G; Scott, W G; Searcy, J; Sedov, G; Sedykh, E; Seidel, S C; Seiden, A; Seifert, F; Seixas, J M; Sekhniaidze, G; Sekula, S J; Selbach, K E; Seliverstov, D M; Sellden, B; Sellers, G; Seman, M; Semprini-Cesari, N; Serfon, C; Serin, L; Serkin, L; Serre, T; Seuster, R; Severini, H; Sfyrla, A; Shabalina, E; Shamim, M; Shan, L Y; Shank, J T; Shao, Q T; Shapiro, M; Shatalov, P B; Shaw, K; Sherwood, P; Shimizu, S; Shimojima, M; Shin, T; Shiyakova, M; Shmeleva, A; Shochet, M J; Short, D; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sidoti, A; Siegert, F; Sijacki, Dj; Silbert, O; Silva, J; Silver, Y; Silverstein, D; Silverstein, S B; Simak, V; Simard, O; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simoniello, R; Simonyan, M; Sinervo, P; Sinev, N B; Sipica, V; Siragusa, G; Sircar, A; Sisakyan, A N; Sivoklokov, S Yu; Sjölin, J; Sjursen, T B; Skinnari, L A; Skottowe, H P; Skovpen, K; Skubic, P; Slater, M; Slavicek, T; Sliwa, K; Smakhtin, V; Smart, B H; Smestad, L; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, B C; Smith, K M; Smizanska, M; Smolek, K; Snesarev, A A; Snidero, G; Snow, S W; Snow, J; Snyder, S; Sobie, R; Sodomka, J; Soffer, A; Soh, D A; Solans, C A; Solar, M; Solc, J; Soldatov, E Yu; Soldevila, U; Solfaroli Camillocci, E; Solodkov, A A; Solovyanov, O V; Solovyev, V; Soni, N; Sood, A; Sopko, V; Sopko, B; Sosebee, M; Soualah, R; Soueid, P; Soukharev, A; South, D; Spagnolo, S; Spanò, F; Spighi, R; Spigo, G; Spiwoks, R; Spousta, M; Spreitzer, T; Spurlock, B; St Denis, R D; Stahlman, J; Stamen, R; Stanecka, E; Stanek, R W; Stanescu, C; Stanescu-Bellu, M; Stanitzki, M M; Stapnes, S; Starchenko, E A; Stark, J; Staroba, P; Starovoitov, P; Staszewski, R; Staude, A; Stavina, P; Steele, G; Steinbach, P; Steinberg, P; Stekl, I; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stern, S; Stewart, G A; Stillings, J A; Stockton, M C; Stoebe, M; Stoerig, K; Stoicea, G; Stonjek, S; Strachota, P; Stradling, A R; Straessner, A; Strandberg, J; Strandberg, S; Strandlie, A; Strang, M; Strauss, E; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Strong, J A; Stroynowski, R; Stugu, B; Stumer, I; Stupak, J; Sturm, P; Styles, N A; Su, D; Subramania, Hs; Subramaniam, R; Succurro, A; Sugaya, Y; Suhr, C; Suk, M; Sulin, V V; Sultansoy, S; Sumida, T; Sun, X; Sundermann, J E; Suruliz, K; Susinno, G; Sutton, M R; Suzuki, Y; Suzuki, Y; Svatos, M; Swedish, S; Swiatlowski, M; Sykora, I; Sykora, T; Ta, D; Tackmann, K; Taffard, A; Tafirout, R; Taiblum, N; Takahashi, Y; Takai, H; Takashima, R; Takeda, H; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A; Tam, J Y C; Tamsett, M C; Tan, K G; Tanaka, J; Tanaka, R; Tanaka, S; Tanaka, S; Tanasijczuk, A J; Tani, K; Tannoury, N; Tapprogge, S; Tardif, D; Tarem, S; Tarrade, F; Tartarelli, G F; Tas, P; Tasevsky, M; Tassi, E; Tayalati, Y; Taylor, C; Taylor, F E; Taylor, G N; Taylor, W; Teinturier, M; Teischinger, F A; Teixeira Dias Castanheira, M; Teixeira-Dias, P; Temming, K K; Ten Kate, H; Teng, P K; Terada, S; Terashi, K; Terron, J; Testa, M; Teuscher, R J; Therhaag, J; Theveneaux-Pelzer, T; Thoma, S; Thomas, J P; Thompson, E N; Thompson, P D; Thompson, P D; Thompson, A S; Thomsen, L A; Thomson, E; Thomson, M; Thong, W M; Thun, R P; Tian, F; Tibbetts, M J; Tic, T; Tikhomirov, V O; Tikhonov, Y A; Timoshenko, S; Tiouchichine, E; Tipton, P; Tisserant, S; Todorov, T; Todorova-Nova, S; Toggerson, B; Tojo, J; Tokár, S; Tokushuku, K; Tollefson, K; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Tonoyan, A; Topfel, C; Topilin, N D; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Tran, H L; Trefzger, T; Tremblet, L; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Triplett, N; Trischuk, W; Trocmé, B; Troncon, C; Trottier-McDonald, M; Trovatelli, M; True, P; Trzebinski, M; Trzupek, A; Tsarouchas, C; Tseng, J C-L; Tsiakiris, M; Tsiareshka, P V; Tsionou, D; Tsipolitis, G; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsukerman, I I; Tsulaia, V; Tsung, J-W; Tsuno, S; Tsybychev, D; Tua, A; Tudorache, A; Tudorache, V; Tuggle, J M; Turala, M; Turecek, D; Turk Cakir, I; Turra, R; Tuts, P M; Tykhonov, A; Tylmad, M; Tyndel, M; Tzanakos, G; Uchida, K; Ueda, I; Ueno, R; Ughetto, M; Ugland, M; Uhlenbrock, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Urbaniec, D; Urquijo, P; Usai, G; Vacavant, L; Vacek, V; Vachon, B; Vahsen, S; Valencic, N; Valentinetti, S; Valero, A; Valery, L; Valkar, S; Valladolid Gallego, E; Vallecorsa, S; Valls Ferrer, J A; Van Berg, R; Van Der Deijl, P C; van der Geer, R; van der Graaf, H; Van Der Leeuw, R; van der Poel, E; van der Ster, D; van Eldik, N; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; Vanadia, M; Vandelli, W; Vaniachine, A; Vankov, P; Vannucci, F; Vari, R; Varnes, E W; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vassilakopoulos, V I; Vazeille, F; Vazquez Schroeder, T; Veloso, F; Veneziano, S; Ventura, A; Ventura, D; Venturi, M; Venturi, N; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, J C; Vest, A; Vetterli, M C; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinek, E; Vinogradov, V B; Virzi, J; Vitells, O; Viti, M; Vivarelli, I; Vives Vaque, F; Vlachos, S; Vladoiu, D; Vlasak, M; Vogel, A; Vokac, P; Volpi, G; Volpi, M; Volpini, G; von der Schmitt, H; von Radziewski, H; von Toerne, E; Vorobel, V; Vorwerk, V; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Vranjes Milosavljevic, M; Vrba, V; Vreeswijk, M; Vu Anh, T; Vuillermet, R; Vukotic, I; Vykydal, Z; Wagner, W; Wagner, P; Wahlen, H; Wahrmund, S; Wakabayashi, J; Walch, S; Walder, J; Walker, R; Walkowiak, W; Wall, R; Waller, P; Walsh, B; Wang, C; Wang, H; Wang, H; Wang, J; Wang, J; Wang, K; Wang, R; Wang, S M; Wang, T; Wang, X; Warburton, A; Ward, C P; Wardrope, D R; Warsinsky, M; Washbrook, A; Wasicki, C; Watanabe, I; Watkins, P M; Watson, A T; Watson, I J; Watson, M F; Watts, G; Watts, S; Waugh, A T; Waugh, B M; Weber, M S; Webster, J S; Weidberg, A R; Weigell, P; Weingarten, J; Weiser, C; Wells, P S; Wenaus, T; Wendland, D; Weng, Z; Wengler, T; Wenig, S; Wermes, N; Werner, M; Werner, P; Werth, M; Wessels, M; Wetter, J; Weydert, C; Whalen, K; White, A; White, M J; White, S; Whitehead, S R; Whiteson, D; Whittington, D; Wicke, D; Wickens, F J; Wiedenmann, W; Wielers, M; Wienemann, P; Wiglesworth, C; Wiik-Fuchs, L A M; Wijeratne, P A; Wildauer, A; Wildt, M A; Wilhelm, I; Wilkens, H G; Will, J Z; Williams, E; Williams, H H; Williams, S; Willis, W; Willocq, S; Wilson, J A; Wilson, M G; Wilson, A; Wingerter-Seez, I; Winkelmann, S; Winklmeier, F; Wittgen, M; Wittig, T; Wittkowski, J; Wollstadt, S J; Wolter, M W; Wolters, H; Wong, W C; Wooden, G; Wosiek, B K; Wotschack, J; Woudstra, M J; Wozniak, K W; Wraight, K; Wright, M; Wrona, B; Wu, S L; Wu, X; Wu, Y; Wulf, E; Wynne, B M; Xella, S; Xiao, M; Xie, S; Xu, C; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yamada, M; Yamaguchi, H; Yamaguchi, Y; Yamamoto, A; Yamamoto, K; Yamamoto, S; Yamamura, T; Yamanaka, T; Yamauchi, K; Yamazaki, T; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, U K; Yang, Y; Yang, Z; Yanush, S; Yao, L; Yasu, Y; Yatsenko, E; Ye, J; Ye, S; Yen, A L; Yilmaz, M; Yoosoofmiya, R; Yorita, K; Yoshida, R; Yoshihara, K; Young, C; Young, C J S; Youssef, S; Yu, D; Yu, D R; Yu, J; Yu, J; Yuan, L; Yurkewicz, A; Zabinski, B; Zaidan, R; Zaitsev, A M; Zambito, S; Zanello, L; Zanzi, D; Zaytsev, A; Zeitnitz, C; Zeman, M; Zemla, A; Zenin, O; Ženiš, T; Zerwas, D; Zevi Della Porta, G; Zhang, D; Zhang, H; Zhang, J; Zhang, L; Zhang, X; Zhang, Z; Zhao, L; Zhao, Z; Zhemchugov, A; Zhong, J; Zhou, B; Zhou, N; Zhou, Y; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhuravlov, V; Zibell, A; Zieminska, D; Zimin, N I; Zimmermann, R; Zimmermann, S; Zimmermann, S; Zinonos, Z; Ziolkowski, M; Zitoun, R; Živković, L; Zmouchko, V V; Zobernig, G; Zoccoli, A; Zur Nedden, M; Zutshi, V; Zwalinski, L
A measurement of splitting scales, as defined by the k T clustering algorithm, is presented for final states containing a W boson produced in proton-proton collisions at a centre-of-mass energy of 7 TeV. The measurement is based on the full 2010 data sample corresponding to an integrated luminosity of 36 pb -1 which was collected using the ATLAS detector at the CERN Large Hadron Collider. Cluster splitting scales are measured in events containing W bosons decaying to electrons or muons. The measurement comprises the four hardest splitting scales in a k T cluster sequence of the hadronic activity accompanying the W boson, and ratios of these splitting scales. Backgrounds such as multi-jet and top-quark-pair production are subtracted and the results are corrected for detector effects. Predictions from various Monte Carlo event generators at particle level are compared to the data. Overall, reasonable agreement is found with all generators, but larger deviations between the predictions and the data are evident in the soft regions of the splitting scales.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection
Liu, Wenfen
2017-01-01
Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
Hierarchical Dirichlet process model for gene expression clustering
2013-01-01
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Canonical PSO Based K-Means Clustering Approach for Real Datasets.
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means
NASA Astrophysics Data System (ADS)
Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.
2018-04-01
This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
ERIC Educational Resources Information Center
Diamond, Abigail
2008-01-01
The term "soft skills" encompasses a cluster of personality traits, language abilities, personal habits and, ultimately, values and attitudes. Soft skills complement "harder", more technical, skills, such as being able to read or type a letter, but they also have a significant impact on the ability of people to do their jobs…
Canonical PSO Based K-Means Clustering Approach for Real Datasets
Dey, Lopamudra; Chakraborty, Sanjay
2014-01-01
“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
A method of operation scheduling based on video transcoding for cluster equipment
NASA Astrophysics Data System (ADS)
Zhou, Haojie; Yan, Chun
2018-04-01
Because of the cluster technology in real-time video transcoding device, the application of facing the massive growth in the number of video assignments and resolution and bit rate of diversity, task scheduling algorithm, and analyze the current mainstream of cluster for real-time video transcoding equipment characteristics of the cluster, combination with the characteristics of the cluster equipment task delay scheduling algorithm is proposed. This algorithm enables the cluster to get better performance in the generation of the job queue and the lower part of the job queue when receiving the operation instruction. In the end, a small real-time video transcode cluster is constructed to analyze the calculation ability, running time, resource occupation and other aspects of various algorithms in operation scheduling. The experimental results show that compared with traditional clustering task scheduling algorithm, task delay scheduling algorithm has more flexible and efficient characteristics.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Data depth based clustering analysis
Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...
2016-01-01
Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Electron spectra of xenon clusters irradiated with a laser-driven plasma soft-x-ray laser pulse
DOE Office of Scientific and Technical Information (OSTI.GOV)
Namba, S.; Takiyama, K.; Hasegawa, N.
Xenon clusters were irradiated with plasma soft-x-ray laser pulses (having a wavelength of 13.9 nm, time duration of 7 ps, and intensities of up to 10 GW/cm{sup 2}). The laser photon energy was high enough to photoionize 4d core electrons. The cross section is large due to a giant resonance. The interaction was investigated by measuring the electron energy spectra. The photoelectron spectra for small clusters indicate that the spectral width due to the 4d hole significantly broadens with increasing cluster size. For larger clusters, the electron energy spectra evolve into a Maxwell-Boltzmann distribution, as a strongly coupled cluster nanoplasmamore » is generated.« less
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grama, Ananth
2013-12-18
A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less
LArSoft: toolkit for simulation, reconstruction and analysis of liquid argon TPC neutrino detectors
NASA Astrophysics Data System (ADS)
Snider, E. L.; Petrillo, G.
2017-10-01
LArSoft is a set of detector-independent software tools for the simulation, reconstruction and analysis of data from liquid argon (LAr) neutrino experiments The common features of LAr time projection chambers (TPCs) enable sharing of algorithm code across detectors of very different size and configuration. LArSoft is currently used in production simulation and reconstruction by the ArgoNeuT, DUNE, LArlAT, MicroBooNE, and SBND experiments. The software suite offers a wide selection of algorithms and utilities, including those for associated photo-detectors and the handling of auxiliary detectors outside the TPCs. Available algorithms cover the full range of simulation and reconstruction, from raw waveforms to high-level reconstructed objects, event topologies and classification. The common code within LArSoft is contributed by adopting experiments, which also provide detector-specific geometry descriptions, and code for the treatment of electronic signals. LArSoft is also a collaboration of experiments, Fermilab and associated software projects which cooperate in setting requirements, priorities, and schedules. In this talk, we outline the general architecture of the software and the interaction with external libraries and detector-specific code. We also describe the dynamics of LArSoft software development between the contributing experiments, the projects supporting the software infrastructure LArSoft relies on, and the core LArSoft support project.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm
NASA Astrophysics Data System (ADS)
Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian
2017-03-01
DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Vincent W.C., E-mail: htvinwu@polyu.edu.hk; Tse, Teddy K.H.; Ho, Cola L.M.
2013-07-01
Monte Carlo (MC) simulation is currently the most accurate dose calculation algorithm in radiotherapy planning but requires relatively long processing time. Faster model-based algorithms such as the anisotropic analytical algorithm (AAA) by the Eclipse treatment planning system and multigrid superposition (MGS) by the XiO treatment planning system are 2 commonly used algorithms. This study compared AAA and MGS against MC, as the gold standard, on brain, nasopharynx, lung, and prostate cancer patients. Computed tomography of 6 patients of each cancer type was used. The same hypothetical treatment plan using the same machine and treatment prescription was computed for each casemore » by each planning system using their respective dose calculation algorithm. The doses at reference points including (1) soft tissues only, (2) bones only, (3) air cavities only, (4) soft tissue-bone boundary (Soft/Bone), (5) soft tissue-air boundary (Soft/Air), and (6) bone-air boundary (Bone/Air), were measured and compared using the mean absolute percentage error (MAPE), which was a function of the percentage dose deviations from MC. Besides, the computation time of each treatment plan was recorded and compared. The MAPEs of MGS were significantly lower than AAA in all types of cancers (p<0.001). With regards to body density combinations, the MAPE of AAA ranged from 1.8% (soft tissue) to 4.9% (Bone/Air), whereas that of MGS from 1.6% (air cavities) to 2.9% (Soft/Bone). The MAPEs of MGS (2.6%±2.1) were significantly lower than that of AAA (3.7%±2.5) in all tissue density combinations (p<0.001). The mean computation time of AAA for all treatment plans was significantly lower than that of the MGS (p<0.001). Both AAA and MGS algorithms demonstrated dose deviations of less than 4.0% in most clinical cases and their performance was better in homogeneous tissues than at tissue boundaries. In general, MGS demonstrated relatively smaller dose deviations than AAA but required longer computation time.« less
Fuzzy logic, neural networks, and soft computing
NASA Technical Reports Server (NTRS)
Zadeh, Lofti A.
1994-01-01
The past few years have witnessed a rapid growth of interest in a cluster of modes of modeling and computation which may be described collectively as soft computing. The distinguishing characteristic of soft computing is that its primary aims are to achieve tractability, robustness, low cost, and high MIQ (machine intelligence quotient) through an exploitation of the tolerance for imprecision and uncertainty. Thus, in soft computing what is usually sought is an approximate solution to a precisely formulated problem or, more typically, an approximate solution to an imprecisely formulated problem. A simple case in point is the problem of parking a car. Generally, humans can park a car rather easily because the final position of the car is not specified exactly. If it were specified to within, say, a few millimeters and a fraction of a degree, it would take hours or days of maneuvering and precise measurements of distance and angular position to solve the problem. What this simple example points to is the fact that, in general, high precision carries a high cost. The challenge, then, is to exploit the tolerance for imprecision by devising methods of computation which lead to an acceptable solution at low cost. By its nature, soft computing is much closer to human reasoning than the traditional modes of computation. At this juncture, the major components of soft computing are fuzzy logic (FL), neural network theory (NN), and probabilistic reasoning techniques (PR), including genetic algorithms, chaos theory, and part of learning theory. Increasingly, these techniques are used in combination to achieve significant improvement in performance and adaptability. Among the important application areas for soft computing are control systems, expert systems, data compression techniques, image processing, and decision support systems. It may be argued that it is soft computing, rather than the traditional hard computing, that should be viewed as the foundation for artificial intelligence. In the years ahead, this may well become a widely held position.
Motion Estimation Using the Firefly Algorithm in Ultrasonic Image Sequence of Soft Tissue
Chao, Chih-Feng; Horng, Ming-Huwi; Chen, Yu-Chan
2015-01-01
Ultrasonic image sequence of the soft tissue is widely used in disease diagnosis; however, the speckle noises usually influenced the image quality. These images usually have a low signal-to-noise ratio presentation. The phenomenon gives rise to traditional motion estimation algorithms that are not suitable to measure the motion vectors. In this paper, a new motion estimation algorithm is developed for assessing the velocity field of soft tissue in a sequence of ultrasonic B-mode images. The proposed iterative firefly algorithm (IFA) searches for few candidate points to obtain the optimal motion vector, and then compares it to the traditional iterative full search algorithm (IFSA) via a series of experiments of in vivo ultrasonic image sequences. The experimental results show that the IFA can assess the vector with better efficiency and almost equal estimation quality compared to the traditional IFSA method. PMID:25873987
Motion estimation using the firefly algorithm in ultrasonic image sequence of soft tissue.
Chao, Chih-Feng; Horng, Ming-Huwi; Chen, Yu-Chan
2015-01-01
Ultrasonic image sequence of the soft tissue is widely used in disease diagnosis; however, the speckle noises usually influenced the image quality. These images usually have a low signal-to-noise ratio presentation. The phenomenon gives rise to traditional motion estimation algorithms that are not suitable to measure the motion vectors. In this paper, a new motion estimation algorithm is developed for assessing the velocity field of soft tissue in a sequence of ultrasonic B-mode images. The proposed iterative firefly algorithm (IFA) searches for few candidate points to obtain the optimal motion vector, and then compares it to the traditional iterative full search algorithm (IFSA) via a series of experiments of in vivo ultrasonic image sequences. The experimental results show that the IFA can assess the vector with better efficiency and almost equal estimation quality compared to the traditional IFSA method.
Clustering algorithm for determining community structure in large networks
NASA Astrophysics Data System (ADS)
Pujol, Josep M.; Béjar, Javier; Delgado, Jordi
2006-07-01
We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laskin, Julia; Johnson, Grant E.; Prabhakaran, Venkateshkumar
Immobilization of complex molecules and clusters on supports plays an important role in a variety of disciplines including materials science, catalysis and biochemistry. In particular, deposition of clusters on surfaces has attracted considerable attention due to their non-scalable, highly size-dependent properties. The ability to precisely control the composition and morphology of clusters and small nanoparticles on surfaces is crucial for the development of next generation materials with rationally tailored properties. Soft- and reactive landing of ions onto solid or liquid surfaces introduces unprecedented selectivity into surface modification by completely eliminating the effect of solvent and sample contamination on the qualitymore » of the film. The ability to select the mass-to-charge ratio of the precursor ion, its kinetic energy and charge state along with precise control of the size, shape and position of the ion beam on the deposition target makes soft-landing an attractive approach for surface modification. High-purity uniform thin films on surfaces generated using mass-selected ion deposition facilitate understanding of critical interfacial phenomena relevant to catalysis, energy generation and storage, and materials science. Our efforts have been directed toward understanding charge retention by soft-landed metal and metal-oxide cluster ions, which may affect both their structure and reactivity. Specifically, we have examined the effect of the surface on charge retention by both positively and negatively charged cluster ions. We found that the electronic properties of the surface play an important role in charge retention by cluster cations. Meanwhile, the electron binding energy is a key factor determining charge retention by cluster anions. These findings provide the scientific foundation for the rational design of interfaces for advanced catalysts and energy storage devices. Further optimization of electrode-electrolyte interfaces for applications in energy storage and electrocatalysis may be achieved by understanding and controlling the properties of soft-landed cluster ions.« less
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets
Wernisch, Lorenz
2017-01-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
Gabasova, Evelina; Reid, John; Wernisch, Lorenz
2017-10-01
Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models
NASA Technical Reports Server (NTRS)
Mjoisness, Eric; Castano, Rebecca; Gray, Alexander
1999-01-01
We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Fast detection of the fuzzy communities based on leader-driven algorithm
NASA Astrophysics Data System (ADS)
Fang, Changjian; Mu, Dejun; Deng, Zhenghong; Hu, Jun; Yi, Chen-He
2018-03-01
In this paper, we present the leader-driven algorithm (LDA) for learning community structure in networks. The algorithm allows one to find overlapping clusters in a network, an important aspect of real networks, especially social networks. The algorithm requires no input parameters and learns the number of clusters naturally from the network. It accomplishes this using leadership centrality in a clever manner. It identifies local minima of leadership centrality as followers which belong only to one cluster, and the remaining nodes are leaders which connect clusters. In this way, the number of clusters can be learned using only the network structure. The LDA is also an extremely fast algorithm, having runtime linear in the network size. Thus, this algorithm can be used to efficiently cluster extremely large networks.
Research on retailer data clustering algorithm based on Spark
NASA Astrophysics Data System (ADS)
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Discriminative clustering on manifold for adaptive transductive classification.
Zhang, Zhao; Jia, Lei; Zhang, Min; Li, Bing; Zhang, Li; Li, Fanzhang
2017-10-01
In this paper, we mainly propose a novel adaptive transductive label propagation approach by joint discriminative clustering on manifolds for representing and classifying high-dimensional data. Our framework seamlessly combines the unsupervised manifold learning, discriminative clustering and adaptive classification into a unified model. Also, our method incorporates the adaptive graph weight construction with label propagation. Specifically, our method is capable of propagating label information using adaptive weights over low-dimensional manifold features, which is different from most existing studies that usually predict the labels and construct the weights in the original Euclidean space. For transductive classification by our formulation, we first perform the joint discriminative K-means clustering and manifold learning to capture the low-dimensional nonlinear manifolds. Then, we construct the adaptive weights over the learnt manifold features, where the adaptive weights are calculated through performing the joint minimization of the reconstruction errors over features and soft labels so that the graph weights can be joint-optimal for data representation and classification. Using the adaptive weights, we can easily estimate the unknown labels of samples. After that, our method returns the updated weights for further updating the manifold features. Extensive simulations on image classification and segmentation show that our proposed algorithm can deliver the state-of-the-art performance on several public datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
GDPC: Gravitation-based Density Peaks Clustering algorithm
NASA Astrophysics Data System (ADS)
Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin
2018-07-01
The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
Mining the National Career Assessment Examination Result Using Clustering Algorithm
NASA Astrophysics Data System (ADS)
Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.
2018-03-01
Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
Wavelet methodology to improve single unit isolation in primary motor cortex cells
Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A.
2016-01-01
The proper isolation of action potentials recorded extracellularly from neural tissue is an active area of research in the fields of neuroscience and biomedical signal processing. This paper presents an isolation methodology for neural recordings using the wavelet transform (WT), a statistical thresholding scheme, and the principal component analysis (PCA) algorithm. The effectiveness of five different mother wavelets was investigated: biorthogonal, Daubachies, discrete Meyer, symmetric, and Coifman; along with three different wavelet coefficient thresholding schemes: fixed form threshold, Stein’s unbiased estimate of risk, and minimax; and two different thresholding rules: soft and hard thresholding. The signal quality was evaluated using three different statistical measures: mean-squared error, root-mean squared, and signal to noise ratio. The clustering quality was evaluated using two different statistical measures: isolation distance, and L-ratio. This research shows that the selection of the mother wavelet has a strong influence on the clustering and isolation of single unit neural activity, with the Daubachies 4 wavelet and minimax thresholding scheme performing the best. PMID:25794461
NASA Astrophysics Data System (ADS)
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review
NASA Astrophysics Data System (ADS)
Kim, Tai-Hoon
The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
A Soft Sensor-Based Three-Dimensional (3-D) Finger Motion Measurement System
Park, Wookeun; Ro, Kyongkwan; Kim, Suin; Bae, Joonbum
2017-01-01
In this study, a soft sensor-based three-dimensional (3-D) finger motion measurement system is proposed. The sensors, made of the soft material Ecoflex, comprise embedded microchannels filled with a conductive liquid metal (EGaln). The superior elasticity, light weight, and sensitivity of soft sensors allows them to be embedded in environments in which conventional sensors cannot. Complicated finger joints, such as the carpometacarpal (CMC) joint of the thumb are modeled to specify the location of the sensors. Algorithms to decouple the signals from soft sensors are proposed to extract the pure flexion, extension, abduction, and adduction joint angles. The performance of the proposed system and algorithms are verified by comparison with a camera-based motion capture system. PMID:28241414
Android Malware Classification Using K-Means Clustering Algorithm
NASA Astrophysics Data System (ADS)
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
An extended affinity propagation clustering method based on different data density types.
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
Generating soft shadows with a depth buffer algorithm
NASA Technical Reports Server (NTRS)
Brotman, L. S.; Badler, N. I.
1984-01-01
Computer-synthesized shadows used to appear with a sharp edge when cast onto a surface. At present the production of more realistic, soft shadows is considered. However, significant costs arise in connection with such a representation. The current investigation is concerned with a pragmatic approach, which combines an existing shadowing method with a popular visible surface rendering technique, called a 'depth buffer', to generate soft shadows resulting from light sources of finite extent. The considered method represents an extension of Crow's (1977) shadow volume algorithm.
NASA Astrophysics Data System (ADS)
Dong, Feng; Heinbuch, Scott; Bernstein, Elliot; Rocca, Jorge
2006-05-01
A desk-top soft x-ray laser is applied to the study of water, methanol, ammonia, sulfur dioxide, carbon dioxide, mixed sulfur dioxide-water, and mixed carbon dioxide-water clusters through single photon ionization time of flight mass spectroscopy. Almost all of the energy above the vertical ionization energy is removed by the ejected electron. Protonated water, methanol, and ammonia clusters dominate the mass spectra for the first three systems. The temperatures of the neutral water and methanol clusters can be estimated. In the case of pure SO2 and CO2, the mass spectra are dominated by (SO2)n^+ and (CO2)n^+ cluster series. When a high or low concentration of SO2/CO2 is mixed with water, we observe (SO2/CO2)nH2O^+ or SO2/CO2(H2O)nH^+ in the mass spectra, respectively. The unimolecular dissociation rate constants for reactions involving loss of one neutral molecule are calculated for the protonated water, methanol, and ammonia clusters as well as for SO2 and CO2 clusters. We find that the 26.5 eV soft x-ray laser is a nearly ideal tool for the study of hydrogen bonded and van der Waals cluster systems and we are currently exploring its usefulness for other more strongly bound systems.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Removal of impulse noise clusters from color images with local order statistics
NASA Astrophysics Data System (ADS)
Ruchay, Alexey; Kober, Vitaly
2017-09-01
This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Study of parameters of the nearest neighbour shared algorithm on clustering documents
NASA Astrophysics Data System (ADS)
Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni
2018-03-01
Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Wang, Jie-sheng; Han, Shuang; Shen, Na-na; Li, Shu-xia
2014-01-01
For meeting the forecasting target of key technology indicators in the flotation process, a BP neural network soft-sensor model based on features extraction of flotation froth images and optimized by shuffled cuckoo search algorithm is proposed. Based on the digital image processing technique, the color features in HSI color space, the visual features based on the gray level cooccurrence matrix, and the shape characteristics based on the geometric theory of flotation froth images are extracted, respectively, as the input variables of the proposed soft-sensor model. Then the isometric mapping method is used to reduce the input dimension, the network size, and learning time of BP neural network. Finally, a shuffled cuckoo search algorithm is adopted to optimize the BP neural network soft-sensor model. Simulation results show that the model has better generalization results and prediction accuracy. PMID:25133210
Algorithms of maximum likelihood data clustering with applications
NASA Astrophysics Data System (ADS)
Giada, Lorenzo; Marsili, Matteo
2002-12-01
We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
A new clustering algorithm applicable to multispectral and polarimetric SAR images
NASA Technical Reports Server (NTRS)
Wong, Yiu-Fai; Posner, Edward C.
1993-01-01
We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aad, G.; Abajyan, T.; Abbott, B.
2013-05-15
A measurement of splitting scales, as defined by the k T clustering algorithm, is presented for final states containing a W boson produced in proton–proton collisions at a centre-of-mass energy of 7 TeV. The measurement is based on the full 2010 data sample corresponding to an integrated luminosity of 36 pb -1 which was collected using the ATLAS detector at the CERN Large Hadron Collider. Cluster splitting scales are measured in events containing W bosons decaying to electrons or muons. The measurement comprises the four hardest splitting scales in a k T cluster sequence of the hadronic activity accompanying themore » W boson, and ratios of these splitting scales. Backgrounds such as multi-jet and top-quark-pair production are subtracted and the results are corrected for detector effects. Predictions from various Monte Carlo event generators at particle level are compared to the data. Overall, reasonable agreement is found with all generators, but larger deviations between the predictions and the data are evident in the soft regions of the splitting scales.« less
Spatial cluster detection using dynamic programming.
Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F
2012-03-25
The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
Spatial cluster detection using dynamic programming
2012-01-01
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.
Zhu, Lin; Chung, Fu-Lai; Wang, Shitong
2009-06-01
The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance
Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao
2018-01-01
Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy. PMID:29795600
Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance.
Liu, Yongli; Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao
2018-01-01
Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy.
Multi-Optimisation Consensus Clustering
NASA Astrophysics Data System (ADS)
Li, Jian; Swift, Stephen; Liu, Xiaohui
Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Swarm Intelligence in Text Document Clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Xiaohui; Potok, Thomas E
2008-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.
Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai
2016-03-01
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Two-loop beam and soft functions for rapidity-dependent jet vetoes
NASA Astrophysics Data System (ADS)
Gangal, Shireen; Gaunt, Jonathan R.; Stahlhofen, Maximilian; Tackmann, Frank J.
2017-02-01
Jet vetoes play an important role in many analyses at the LHC. Traditionally, jet vetoes have been imposed using a restriction on the transverse momentum p Tj of jets. Alternatively, one can also consider jet observables for which p Tj is weighted by a smooth function of the jet rapidity y j that vanishes as | y j | → ∞. Such observables are useful as they provide a natural way to impose a tight veto on central jets but a looser one at forward rapidities. We consider two such rapidity-dependent jet veto observables, T_{Bj} and {T_{Cj} , and compute the required beam and dijet soft functions for the jet-vetoed color-singlet production cross section at two loops. At this order, clustering effects from the jet algorithm become important. The dominant contributions are computed fully analytically while corrections that are subleading in the limit of small jet radii are expressed in terms of finite numerical integrals. Our results enable the full NNLL' resummation and are an important step towards N3LL resummation for cross sections with a T_{Bj} or T_{Cj} jet veto.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Soft X-ray excess in the Coma cluster from a Cosmic Axion Background
DOE Office of Scientific and Technical Information (OSTI.GOV)
Angus, Stephen; Conlon, Joseph P.; Marsh, M.C. David
2014-09-01
We show that the soft X-ray excess in the Coma cluster can be explained by a cosmic background of relativistic axion-like particles (ALPs) converting into photons in the cluster magnetic field. We provide a detailed self-contained review of the cluster soft X-ray excess, the proposed astrophysical explanations and the problems they face, and explain how a 0.1- 1 keV axion background naturally arises at reheating in many string theory models of the early universe. We study the morphology of the soft excess by numerically propagating axions through stochastic, multi-scale magnetic field models that are consistent with observations of Faraday rotation measuresmore » from Coma. By comparing to ROSAT observations of the 0.2- 0.4 keV soft excess, we find that the overall excess luminosity is easily reproduced for g{sub aγγ} ∼ 2 × 10{sup -13} Ge {sup -1}. The resulting morphology is highly sensitive to the magnetic field power spectrum. For Gaussian magnetic field models, the observed soft excess morphology prefers magnetic field spectra with most power in coherence lengths on O(3 kpc) scales over those with most power on O(12 kpc) scales. Within this scenario, we bound the mean energy of the axion background to 50 eV∼< ( E{sub a} ) ∼< 250 eV, the axion mass to m{sub a} ∼< 10{sup -12} eV, and derive a lower bound on the axion-photon coupling g{sub aγγ} ∼> √(0.5/Δ N{sub eff}) 1.4 × 10{sup -13} Ge {sup -1}.« less
A high throughput architecture for a low complexity soft-output demapping algorithm
NASA Astrophysics Data System (ADS)
Ali, I.; Wasenmüller, U.; Wehn, N.
2015-11-01
Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
NASA Astrophysics Data System (ADS)
Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida
2015-05-01
Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Deb, Suash; Yang, Xin-She
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang
2017-01-01
Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.
Wang, Haizhou; Song, Mingzhou
2011-12-01
The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
NASA Astrophysics Data System (ADS)
Frisca, Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem
NASA Astrophysics Data System (ADS)
Korayem, L.; Khorsid, M.; Kassem, S. S.
2015-05-01
The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Early dynamical evolution of substructured stellar clusters
NASA Astrophysics Data System (ADS)
Dorval, Julien; Boily, Christian
2015-08-01
It is now widely accepted that stellar clusters form with a high level of substructure (Kuhn et al. 2014, Bate 2009), inherited from the molecular cloud and the star formation process. Evidence from observations and simulations also indicate the stars in such young clusters form a subvirial system (Kirk et al. 2007, Maschberger et al. 2010). The subsequent dynamical evolution can cause important mass loss, ejecting a large part of the birth population in the field. It can also imprint the stellar population and still be inferred from observations of evolved clusters. Nbody simulations allow a better understanding of these early twists and turns, given realistic initial conditions. Nowadays, substructured, clumpy young clusters are usually obtained through pseudo-fractal growth (Goodwin et al. 2004) and velocity inheritance. Such models are visually realistics and are very useful, they are however somewhat artificial in their velocity distribution. I introduce a new way to create clumpy initial conditions through a "Hubble expansion" which naturally produces self consistent clumps, velocity-wise. A velocity distribution analysis shows the new method produces realistic models, consistent with the dynamical state of the newly created cores in hydrodynamic simulation of cluster formation (Klessen & Burkert 2000). I use these initial conditions to investigate the dynamical evolution of young subvirial clusters, up to 80000 stars. I find an overall soft evolution, with hierarchical merging leading to a high level of mass segregation. I investigate the influence of the mass function on the fate of the cluster, specifically on the amount of mass loss induced by the early violent relaxation. Using a new binary detection algorithm, I also find a strong processing of the native binary population.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Lukashin, A V; Fuchs, R
2001-05-01
Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Basic firefly algorithm for document clustering
NASA Astrophysics Data System (ADS)
Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza
2015-12-01
The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.
Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg
2017-11-03
In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.
Weighted graph cuts without eigenvectors a multilevel approach.
Dhillon, Inderjit S; Guan, Yuqiang; Kulis, Brian
2007-11-01
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Two generalizations of Kohonen clustering
NASA Technical Reports Server (NTRS)
Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.
1993-01-01
The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maspero, M.; Meijer, G.J.; Lagendijk, J.J.W.
2015-06-15
Purpose: To develop an image processing method for MRI-based generation of electron density maps, known as pseudo-CT (pCT), without usage of model- or atlas-based segmentation, and to evaluate the method in the pelvic and head-neck region against CT. Methods: CT and MRI scans were obtained from the pelvic region of four patients in supine position using a flat table top only for CT. Stratified CT maps were generated by classifying each voxel based on HU ranges into one of four classes: air, adipose tissue, soft tissue or bone.A hierarchical region-selective algorithm, based on automatic thresholding and clustering, was used tomore » classify tissues from MR Dixon reconstructed fat, In-Phase (IP) and Opposed-Phase (OP) images. First, a body mask was obtained by thresholding the IP image. Subsequently, an automatic threshold on the Dixon fat image differentiated soft and adipose tissue. K-means clustering on IP and OP images resulted in a mask that, via a connected neighborhood analysis, allowing the user to select the components corresponding to bone structures.The pCT was estimated through assignment of bulk HU to the tissue classes. Bone-only Digital Reconstructed Radiographs (DRR) were generated as well. The pCT images were rigidly registered to the stratified CT to allow a volumetric and voxelwise comparison. Moreover, pCTs were also calculated within the head-neck region in two volunteers using the same pipeline. Results: The volumetric comparison resulted in differences <1% for each tissue class. A voxelwise comparison showed a good classification, ranging from 64% to 98%. The primary misclassified classes were adipose/soft tissue and bone/soft tissue. As the patients have been imaged on different table tops, part of the misclassification error can be explained by misregistration. Conclusion: The proposed approach does not rely on an anatomy model providing the flexibility to successfully generate the pCT in two different body sites. This research is founded by ZonMw IMDI Programme, project name: “RASOR sharp: MRI based radiotherapy planning using a single MRI sequence”, project number: 10-104003010.« less
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Diametrical clustering for identifying anti-correlated gene clusters.
Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman
2003-09-01
Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
Regionalization by fuzzy expert system based approach optimized by genetic algorithm
NASA Astrophysics Data System (ADS)
Chavoshi, Sattar; Azmin Sulaiman, Wan Nor; Saghafian, Bahram; Bin Sulaiman, Md. Nasir; Manaf, Latifah Abd
2013-04-01
SummaryIn recent years soft computing methods are being increasingly used to model complex hydrologic processes. These methods can simulate the real life processes without prior knowledge of the exact relationship between their components. The principal aim of this paper is perform hydrological regionalization based on soft computing concepts in the southern strip of the Caspian Sea basin, north of Iran. The basin with an area of 42,400 sq. km has been affected by severe floods in recent years that caused damages to human life and properties. Although some 61 hydrometric stations and 31 weather stations with 44 years of observed data (1961-2005) are operated in the study area, previous flood studies in this region have been hampered by insufficient and/or reliable observed rainfall-runoff records. In order to investigate the homogeneity (h) of catchments and overcome incompatibility that may occur on boundaries of cluster groups, a fuzzy expert system (FES) approach is used which incorporates physical and climatic characteristics, as well as flood seasonality and geographic location. Genetic algorithm (GA) was employed to adjust parameters of FES and optimize the system. In order to achieve the objective, a MATLAB programming code was developed which considers the heterogeneity criteria of less than 1 (H < 1) as the satisfying criteria. The adopted approach was found superior to the conventional hydrologic regionalization methods in the region because it employs greater number of homogeneity parameters and produces lower values of heterogeneity criteria.
A novel harmony search-K means hybrid algorithm for clustering gene expression data
Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
m-BIRCH: an online clustering approach for computer vision applications
NASA Astrophysics Data System (ADS)
Madan, Siddharth K.; Dana, Kristin J.
2015-03-01
We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
Zhang, Peng; Liu, Keping; Zhao, Bo; Li, Yuanchun
2015-01-01
Optimal guidance is essential for the soft landing task. However, due to its high computational complexities, it is hardly applied to the autonomous guidance. In this paper, a computationally inexpensive optimal guidance algorithm based on the radial basis function neural network (RBFNN) is proposed. The optimization problem of the trajectory for soft landing on asteroids is formulated and transformed into a two-point boundary value problem (TPBVP). Combining the database of initial states with the relative initial co-states, an RBFNN is trained offline. The optimal trajectory of the soft landing is determined rapidly by applying the trained network in the online guidance. The Monte Carlo simulations of soft landing on the Eros433 are performed to demonstrate the effectiveness of the proposed guidance algorithm. PMID:26367382
NASA Astrophysics Data System (ADS)
Shahamatnia, Ehsan; Dorotovič, Ivan; Fonseca, Jose M.; Ribeiro, Rita A.
2016-03-01
Developing specialized software tools is essential to support studies of solar activity evolution. With new space missions such as Solar Dynamics Observatory (SDO), solar images are being produced in unprecedented volumes. To capitalize on that huge data availability, the scientific community needs a new generation of software tools for automatic and efficient data processing. In this paper a prototype of a modular framework for solar feature detection, characterization, and tracking is presented. To develop an efficient system capable of automatic solar feature tracking and measuring, a hybrid approach combining specialized image processing, evolutionary optimization, and soft computing algorithms is being followed. The specialized hybrid algorithm for tracking solar features allows automatic feature tracking while gathering characterization details about the tracked features. The hybrid algorithm takes advantages of the snake model, a specialized image processing algorithm widely used in applications such as boundary delineation, image segmentation, and object tracking. Further, it exploits the flexibility and efficiency of Particle Swarm Optimization (PSO), a stochastic population based optimization algorithm. PSO has been used successfully in a wide range of applications including combinatorial optimization, control, clustering, robotics, scheduling, and image processing and video analysis applications. The proposed tool, denoted PSO-Snake model, was already successfully tested in other works for tracking sunspots and coronal bright points. In this work, we discuss the application of the PSO-Snake algorithm for calculating the sidereal rotational angular velocity of the solar corona. To validate the results we compare them with published manual results performed by an expert.
Simulating Soft Shadows with Graphics Hardware,
1997-01-15
This radiance texture is analogous to the mesh of radiosity values computed in a radiosity algorithm. Unlike a radiosity algorithm, however, our...discretely. Several researchers have explored continuous visibility methods for soft shadow computation and radiosity mesh generation. With this approach...times of several seconds [9]. Most radiosity methods discretize each surface into a mesh of elements and then use discrete methods such as ray
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
Poole, William; Leinonen, Kalle; Shmulevich, Ilya
2017-01-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady
2017-02-01
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.
A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the
NASA Astrophysics Data System (ADS)
Czarski, T.; Chernyshova, M.; Malinowski, K.; Pozniak, K. T.; Kasprowicz, G.; Kolasinski, P.; Krawczyk, R.; Wojenski, A.; Zabolotny, W.
2016-11-01
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals, and cluster charge values corresponding to the energy spectra.
Czarski, T; Chernyshova, M; Malinowski, K; Pozniak, K T; Kasprowicz, G; Kolasinski, P; Krawczyk, R; Wojenski, A; Zabolotny, W
2016-11-01
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals, and cluster charge values corresponding to the energy spectra.
Schröder, Helmut; Mendez, Michelle A; Ribas, Lourdes; Funtikova, Anna N; Gomez, Santiago F; Fíto, Montserrat; Aranceta, Javier; Serra-Majem, Lluis
2014-09-01
The present study assesses the impact of beverage consumption pattern on diet quality and anthropometric proxy measures for abdominal adiposity in Spanish adolescents. Data were obtained from a representative national sample of 1,149 Spanish adolescents aged 10-18 years. Height, weight, and waist circumferences were measured. Dietary assessment was performed with a 24-h recall. Beverage patterns were identified by cluster analysis. Adherence to the Mediterranean diet was measured by the KIDMED index. Three beverage clusters were identified for boys--"whole milk" (62.5 %), "low-fat milk" (17.5 %) and "soft drinks" (20.1 %)-and for girls--"whole milk" (57.8 %), "low-fat milk" (20.8 %) and juice (21.4 %), accounting for 8.3, 9.6, 13.9, 8.6, 11.5 and 12.9 % of total energy intake, respectively. Each unit of increase in the KIDMED index was associated with a 14.0 % higher (p = 0.004) and 11.0 % lower (p = 0.048) probability of membership in the "low-fat milk" and "soft drinks" cluster in girls and boys, respectively, compared with the "whole milk" cluster. Boys in the "soft drinks" cluster had a higher risk of 1-unit increase in BMI z score (29.0 %, p = 0.040), 1-cm increase in waist circumference regressed on height and age (3.0 %, p = 0.027) and 0.1-unit increase in waist/height ratio (21.4 %, p = 0.031) compared with the "whole milk" cluster. A caloric beverage pattern dominated by intake of "soft drinks" is related to general and abdominal adiposity and diet quality in Spanish male adolescents.
Security clustering algorithm based on reputation in hierarchical peer-to-peer network
NASA Astrophysics Data System (ADS)
Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji
2013-03-01
For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Determining open cluster membership. A Bayesian framework for quantitative member classification
NASA Astrophysics Data System (ADS)
Stott, Jonathan J.
2018-01-01
Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.
Random Walk Quantum Clustering Algorithm Based on Space
NASA Astrophysics Data System (ADS)
Xiao, Shufen; Dong, Yumin; Ma, Hongyang
2018-01-01
In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
ERIC Educational Resources Information Center
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Multi scales based sparse matrix spectral clustering image segmentation
NASA Astrophysics Data System (ADS)
Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin
2018-04-01
In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
An AK-LDMeans algorithm based on image clustering
NASA Astrophysics Data System (ADS)
Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan
2018-03-01
Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.
Bi, Xia-An; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.
Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin
2018-05-03
Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Ying Wah, Teh
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
A clustering algorithm for determining community structure in complex networks
NASA Astrophysics Data System (ADS)
Jin, Hong; Yu, Wei; Li, ShiJun
2018-02-01
Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
Collaborative filtering recommendation model based on fuzzy clustering algorithm
NASA Astrophysics Data System (ADS)
Yang, Ye; Zhang, Yunhua
2018-05-01
As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.
A Fast Projection-Based Algorithm for Clustering Big Data.
Wu, Yun; He, Zhiquan; Lin, Hao; Zheng, Yufei; Zhang, Jingfen; Xu, Dong
2018-06-07
With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Density functional theory and surface reactivity study of bimetallic AgnYm (n+m = 10) clusters
NASA Astrophysics Data System (ADS)
Hussain, Riaz; Hussain, Abdullah Ijaz; Chatha, Shahzad Ali Shahid; Hussain, Riaz; Hanif, Usman; Ayub, Khurshid
2018-06-01
Density functional theory calculations have been performed on pure silver (Agn), yttrium (Ym) and bimetallic silver yttrium clusters AgnYm (n + m = 2-10) for reactivity descriptors in order to realize sites for nucleophilic and electrophilic attack. The reactivity descriptors of the clusters, studied as a function of cluster size and shape, reveal the presence of different type of reactive sites in a cluster. The size and shape of the pure silver, yttrium and bimetallic silver yttrium cluster (n = 2-10) strongly influences the number and position of active sites for an electrophilic and/or nucleophilic attack. The trends of reactivities through reactivity descriptors are confirmed through comparison with experimental data for CO binding with silver clusters. Moreover, the adsorption of CO on bimetallic silver yttrium clusters is also evaluated. The trends of binding energies support the reactivity descriptors values. Doping of pure cluster with the other element also influence the hardness, softness and chemical reactivity of the clusters. The softness increases as we increase the number of silver atoms in the cluster, whereas the hardness decreases. The chemical reactivity increases with silver doping whereas it decreases by increasing yttrium concentration. Silver atoms are nucleophilic in small clusters but changed to electrophilic in large clusters.
NASA Astrophysics Data System (ADS)
Park, Eun Ji; Choi, Chang Min; Kim, Il Hee; Kim, Jung-Hwan; Lee, Gaehang; Jin, Jong Sung; Ganteför, Gerd; Kim, Young Dok; Choi, Myoung Choul
2018-01-01
Wet-chemically synthesized Au nanoparticles were deposited on Si wafer surfaces, and the secondary ions mass spectra (SIMS) from these samples were collected using Bi3+ with an energy of 30 keV as the primary ions. In the SIMS, Au cluster cations with a well-known, even-odd alteration pattern in the signal intensity were observed. We also performed depth profile SIMS analyses, i.e., etching the surface using an Ar gas cluster ion beam (GCIB), and a subsequent Bi3+ SIMS analysis was repetitively performed. Here, two different etching conditions (Ar1600 clusters of 10 keV energy or Ar1000 of 2.5 keV denoted as "harsh" or "soft" etching conditions, respectively) were used. Etching under harsh conditions induced emission of the Au-Si binary cluster cations in the SIMS spectra of the Bi3+ primary ions. The formation of binary cluster cations can be induced by either fragmentation of Au nanoparticles or alloying of Au and Si, increasing Au-Si coordination on the sample surface during harsh GCIB etching. Alternatively, use of the soft GCIB etching conditions resulted in exclusive emission of pure Au cluster cations with nearly no Au-Si cluster cation formation. Depth profile analyses of the Bi3+ SIMS combined with soft GCIB etching can be useful for studying the chemical environments of atoms at the surface without altering the original interface structure during etching.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
NASA Technical Reports Server (NTRS)
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing
2014-01-01
With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
Nagaoka, Shuhei; Matsumoto, Takeshi; Okada, Eiji; Mitsui, Masaaki; Nakajima, Atsushi
2006-08-17
The adsorption state and thermal stability of V(benzene)2 sandwich clusters soft-landed onto a self-assembled monolayer of different chain-length n-alkanethiols (Cn-SAM, n = 8, 12, 16, 18, and 22) were studied by means of infrared reflection absorption spectroscopy (IRAS) and temperature-programmed desorption (TPD). The IRAS measurement confirmed that V(benzene)2 clusters are molecularly adsorbed and maintain a sandwich structure on all of the SAM substrates. In addition, the clusters supported on the SAM substrates are oriented with their molecular axes tilted 70-80 degrees off the surface normal. An Arrhenius analysis of the TPD spectra reveals that the activation energy for the desorption of the supported clusters increases linearly with the chain length of the SAMs. For the longest chain C22-SAM, the activation energy reaches approximately 150 kJ/mol, and the thermal desorption of the supported clusters can be considerably suppressed near room temperature. The clear chain-length-dependent thermal stability of the supported clusters observed here can be explained well in terms of the cluster penetration into the SAM matrixes.
Data mining in soft computing framework: a survey.
Mitra, S; Pal, S K; Mitra, P
2002-01-01
The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin
2017-04-26
In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Blessy, S A Praylin Selva; Sulochana, C Helen
2015-01-01
Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
Adaptive density trajectory cluster based on time and space distance
NASA Astrophysics Data System (ADS)
Liu, Fagui; Zhang, Zhijie
2017-10-01
There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.
Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei
2013-05-01
Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.
NASA Astrophysics Data System (ADS)
Fatimah, F.; Rosadi, D.; Hakim, R. B. F.
2018-03-01
In this paper, we motivate and introduce probabilistic soft sets and dual probabilistic soft sets for handling decision making problem in the presence of positive and negative parameters. We propose several types of algorithms related to this problem. Our procedures are flexible and adaptable. An example on real data is also given.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Maximum-likelihood soft-decision decoding of block codes using the A* algorithm
NASA Technical Reports Server (NTRS)
Ekroot, L.; Dolinar, S.
1994-01-01
The A* algorithm finds the path in a finite depth binary tree that optimizes a function. Here, it is applied to maximum-likelihood soft-decision decoding of block codes where the function optimized over the codewords is the likelihood function of the received sequence given each codeword. The algorithm considers codewords one bit at a time, making use of the most reliable received symbols first and pursuing only the partially expanded codewords that might be maximally likely. A version of the A* algorithm for maximum-likelihood decoding of block codes has been implemented for block codes up to 64 bits in length. The efficiency of this algorithm makes simulations of codes up to length 64 feasible. This article details the implementation currently in use, compares the decoding complexity with that of exhaustive search and Viterbi decoding algorithms, and presents performance curves obtained with this implementation of the A* algorithm for several codes.
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Generation of strongly coupled Xe cluster nanoplasmas by low intensive soft x-ray laser irradiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Namba, S.; Hasegawa, N.; Kishimoto, M.
A seeding gas jet including Xe clusters was irradiated with a laser-driven plasma soft x-ray laser pulse ({lambda}=13.9 nm, {approx}7 ps, {<=}5 Multiplication-Sign 10{sup 9} W/cm{sup 2}), where the laser photon energy is high enough to ionize 4d core electrons. In order to clarify how the innershell ionization followed by the Auger electron emission is affected under the intense laser irradiation, the electron energy distribution was measured. Photoelectron spectra showed that the peak position attributed to 4d hole shifted to lower energy and the spectral width was broadened with increasing cluster size. Moreover, the energy distribution exhibited that a stronglymore » coupled cluster nanoplasma with several eV was generated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czarski, T., E-mail: tomasz.czarski@ifpilm.pl; Chernyshova, M.; Malinowski, K.
2016-11-15
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals,more » and cluster charge values corresponding to the energy spectra.« less
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering
NASA Technical Reports Server (NTRS)
Rizvi, Farheen
2013-01-01
The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Wavelet methodology to improve single unit isolation in primary motor cortex cells.
Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A
2015-05-15
The proper isolation of action potentials recorded extracellularly from neural tissue is an active area of research in the fields of neuroscience and biomedical signal processing. This paper presents an isolation methodology for neural recordings using the wavelet transform (WT), a statistical thresholding scheme, and the principal component analysis (PCA) algorithm. The effectiveness of five different mother wavelets was investigated: biorthogonal, Daubachies, discrete Meyer, symmetric, and Coifman; along with three different wavelet coefficient thresholding schemes: fixed form threshold, Stein's unbiased estimate of risk, and minimax; and two different thresholding rules: soft and hard thresholding. The signal quality was evaluated using three different statistical measures: mean-squared error, root-mean squared, and signal to noise ratio. The clustering quality was evaluated using two different statistical measures: isolation distance, and L-ratio. This research shows that the selection of the mother wavelet has a strong influence on the clustering and isolation of single unit neural activity, with the Daubachies 4 wavelet and minimax thresholding scheme performing the best. Copyright © 2015. Published by Elsevier B.V.
Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging
NASA Astrophysics Data System (ADS)
Liu, Feng; Ye, Chengcheng; Zhu, Erzhou
2017-09-01
Due to the advent of big data, data mining technology has attracted more and more attentions. As an important data analysis method, grid clustering algorithm is fast but with relatively lower accuracy. This paper presents an improved clustering algorithm combined with grid and density parameters. The algorithm first divides the data space into the valid meshes and invalid meshes through grid parameters. Secondly, from the starting point located at the first point of the diagonal of the grids, the algorithm takes the direction of “horizontal right, vertical down” to merge the valid meshes. Furthermore, by the boundary grid processing, the invalid grids are searched and merged when the adjacent left, above, and diagonal-direction grids are all the valid ones. By doing this, the accuracy of clustering is improved. The experimental results have shown that the proposed algorithm is accuracy and relatively faster when compared with some popularly used algorithms.
NASA Astrophysics Data System (ADS)
Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan
2018-04-01
Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Flumignan, Danilo Luiz; Boralle, Nivaldo; Oliveira, José Eduardo de
2010-06-30
In this work, the combination of carbon nuclear magnetic resonance ((13)C NMR) fingerprinting with pattern-recognition analyses provides an original and alternative approach to screening commercial gasoline quality. Soft Independent Modelling of Class Analogy (SIMCA) was performed on spectroscopic fingerprints to classify representative commercial gasoline samples, which were selected by Hierarchical Cluster Analyses (HCA) over several months in retails services of gas stations, into previously quality-defined classes. Following optimized (13)C NMR-SIMCA algorithm, sensitivity values were obtained in the training set (99.0%), with leave-one-out cross-validation, and external prediction set (92.0%). Governmental laboratories could employ this method as a rapid screening analysis to discourage adulteration practices. Copyright 2010 Elsevier B.V. All rights reserved.
Robust MST-Based Clustering Algorithm.
Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing
2018-06-01
Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
NASA Astrophysics Data System (ADS)
Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.
2005-05-01
Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.
Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solomon, Justin, E-mail: justin.solomon@duke.edu; Samei, Ehsan
2014-09-15
Purpose: Quantum noise properties of CT images are generally assessed using simple geometric phantoms with uniform backgrounds. Such phantoms may be inadequate when assessing nonlinear reconstruction or postprocessing algorithms. The purpose of this study was to design anatomically informed textured phantoms and use the phantoms to assess quantum noise properties across two clinically available reconstruction algorithms, filtered back projection (FBP) and sinogram affirmed iterative reconstruction (SAFIRE). Methods: Two phantoms were designed to represent lung and soft-tissue textures. The lung phantom included intricate vessel-like structures along with embedded nodules (spherical, lobulated, and spiculated). The soft tissue phantom was designed based onmore » a three-dimensional clustered lumpy background with included low-contrast lesions (spherical and anthropomorphic). The phantoms were built using rapid prototyping (3D printing) technology and, along with a uniform phantom of similar size, were imaged on a Siemens SOMATOM Definition Flash CT scanner and reconstructed with FBP and SAFIRE. Fifty repeated acquisitions were acquired for each background type and noise was assessed by estimating pixel-value statistics, such as standard deviation (i.e., noise magnitude), autocorrelation, and noise power spectrum. Noise stationarity was also assessed by examining the spatial distribution of noise magnitude. The noise properties were compared across background types and between the two reconstruction algorithms. Results: In FBP and SAFIRE images, noise was globally nonstationary for all phantoms. In FBP images of all phantoms, and in SAFIRE images of the uniform phantom, noise appeared to be locally stationary (within a reasonably small region of interest). Noise was locally nonstationary in SAFIRE images of the textured phantoms with edge pixels showing higher noise magnitude compared to pixels in more homogenous regions. For pixels in uniform regions, noise magnitude was reduced by an average of 60% in SAFIRE images compared to FBP. However, for edge pixels, noise magnitude ranged from 20% higher to 40% lower in SAFIRE images compared to FBP. SAFIRE images of the lung phantom exhibited distinct regions with varying noise texture (i.e., noise autocorrelation/power spectra). Conclusions: Quantum noise properties observed in uniform phantoms may not be representative of those in actual patients for nonlinear reconstruction algorithms. Anatomical texture should be considered when evaluating the performance of CT systems that use such nonlinear algorithms.« less
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Size dependent fragmentation of argon clusters in the soft x-ray ionization regime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gisselbrecht, Mathieu; Lindgren, Andreas; Burmeister, Florian
Photofragmentation of argon clusters of average size ranging from 10 up to 1000 atoms is studied using soft x-ray radiation below the 2p threshold and multicoincidence mass spectroscopy technique. For small clusters (
A Giant Warm Baryonic Halo for the Coma Cluster
NASA Technical Reports Server (NTRS)
Bonamente, Max; Lieu, Richard; Joy, Marshall K.; Six, N. Frank (Technical Monitor)
2002-01-01
Several deep PSPC observations of the Coma cluster unveil a very large-scale halo of soft X-ray emission, substantially in excess of the well know radiation from the hot intra-cluster medium. The excess emission, previously reported in the central cluster regions through lower-sensitivity EUVE and ROSAT data, is now evident out to a radius of 2.5 Mpc, demonstrating that the soft excess radiation from clusters is a phenomenon of cosmological significance. The spectrum at these large radii cannot be modeled non-thermally, but is consistent with the original scenario of thermal emission at warm temperatures. The mass of this plasma is at least on par with that of the hot X-ray emitting plasma, and significantly more massive if the plasma resides in low-density filamentary structures. Thus the data lend vital support to current theories of cosmic evolution, which predict greater than 50 percent by mass of today's baryons reside in warm-hot filaments converging at clusters of galaxies.
A fast parallel clustering algorithm for molecular simulation trajectories.
Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui
2013-01-15
We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.
Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong
2015-12-01
Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
Soft computing methods in design of superalloys
NASA Technical Reports Server (NTRS)
Cios, K. J.; Berke, L.; Vary, A.; Sharma, S.
1995-01-01
Soft computing techniques of neural networks and genetic algorithms are used in the design of superalloys. The cyclic oxidation attack parameter K(sub a), generated from tests at NASA Lewis Research Center, is modeled as a function of the superalloy chemistry and test temperature using a neural network. This model is then used in conjunction with a genetic algorithm to obtain an optimized superalloy composition resulting in low K(sub a) values.
Soft Computing Methods in Design of Superalloys
NASA Technical Reports Server (NTRS)
Cios, K. J.; Berke, L.; Vary, A.; Sharma, S.
1996-01-01
Soft computing techniques of neural networks and genetic algorithms are used in the design of superalloys. The cyclic oxidation attack parameter K(sub a), generated from tests at NASA Lewis Research Center, is modelled as a function of the superalloy chemistry and test temperature using a neural network. This model is then used in conjunction with a genetic algorithm to obtain an optimized superalloy composition resulting in low K(sub a) values.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
NASA Astrophysics Data System (ADS)
Fan, Tian-E.; Shao, Gui-Fang; Ji, Qing-Shuang; Zheng, Ji-Wen; Liu, Tun-dong; Wen, Yu-Hua
2016-11-01
Theoretically, the determination of the structure of a cluster is to search the global minimum on its potential energy surface. The global minimization problem is often nondeterministic-polynomial-time (NP) hard and the number of local minima grows exponentially with the cluster size. In this article, a multi-populations multi-strategies differential evolution algorithm has been proposed to search the globally stable structure of Fe and Cr nanoclusters. The algorithm combines a multi-populations differential evolution with an elite pool scheme to keep the diversity of the solutions and avoid prematurely trapping into local optima. Moreover, multi-strategies such as growing method in initialization and three differential strategies in mutation are introduced to improve the convergence speed and lower the computational cost. The accuracy and effectiveness of our algorithm have been verified by comparing the results of Fe clusters with Cambridge Cluster Database. Meanwhile, the performance of our algorithm has been analyzed by comparing the convergence rate and energy evaluations with the classical DE algorithm. The multi-populations, multi-strategies mutation and growing method in initialization in our algorithm have been considered respectively. Furthermore, the structural growth pattern of Cr clusters has been predicted by this algorithm. The results show that the lowest-energy structure of Cr clusters contains many icosahedra, and the number of the icosahedral rings rises with increasing size.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".
Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio
2018-05-04
In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A dynamic scheduling algorithm for singe-arm two-cluster tools with flexible processing times
NASA Astrophysics Data System (ADS)
Li, Xin; Fung, Richard Y. K.
2018-02-01
This article presents a dynamic algorithm for job scheduling in two-cluster tools producing multi-type wafers with flexible processing times. Flexible processing times mean that the actual times for processing wafers should be within given time intervals. The objective of the work is to minimize the completion time of the newly inserted wafer. To deal with this issue, a two-cluster tool is decomposed into three reduced single-cluster tools (RCTs) in a series based on a decomposition approach proposed in this article. For each single-cluster tool, a dynamic scheduling algorithm based on temporal constraints is developed to schedule the newly inserted wafer. Three experiments have been carried out to test the dynamic scheduling algorithm proposed, comparing with the results the 'earliest starting time' heuristic (EST) adopted in previous literature. The results show that the dynamic algorithm proposed in this article is effective and practical.
A novel artificial bee colony based clustering algorithm for categorical data.
Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang
2015-01-01
Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.
Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong
2013-01-01
This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A genetic graph-based approach for partitional clustering.
Menéndez, Héctor D; Barrero, David F; Camacho, David
2014-05-01
Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Machine-learned cluster identification in high-dimensional data.
Ultsch, Alfred; Lötsch, Jörn
2017-02-01
High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Kamali, Tahereh; Stashuk, Daniel
2016-10-01
Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright © 2016 Elsevier B.V. All rights reserved.
Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin
2017-01-01
In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate. PMID:28445434
Nidheesh, N; Abdul Nazeer, K A; Ameer, P M
2017-12-01
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Soft functions for generic jet algorithms and observables at hadron colliders
Bertolini, Daniele; Kolodrubetz, Daniel; Neill, Duff Austin; ...
2017-07-20
Here, we introduce a method to compute one-loop soft functions for exclusive N - jet processes at hadron colliders, allowing for different definitions of the algorithm that determines the jet regions and of the measurements in those regions. In particular, we generalize the N -jettiness hemisphere decomposition of ref. [1] in a manner that separates the dependence on the jet boundary from the observables measured inside the jet and beam regions. Results are given for several factorizable jet definitions, including anti- kT , XCone, and other geometric partitionings. We calculate explicitly the soft functions for angularity measurements, including jet massmore » and jet broadening, in pp → L + 1 jet and explore the differences for various jet vetoes and algorithms. This includes a consistent treatment of rapidity divergences when applicable. We also compute analytic results for these soft functions in an expansion for a small jet radius R. We find that the small- R results, including corrections up to O(R 2), accurately capture the full behavior over a large range of R.« less
Soft functions for generic jet algorithms and observables at hadron colliders
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertolini, Daniele; Kolodrubetz, Daniel; Neill, Duff Austin
Here, we introduce a method to compute one-loop soft functions for exclusive N - jet processes at hadron colliders, allowing for different definitions of the algorithm that determines the jet regions and of the measurements in those regions. In particular, we generalize the N -jettiness hemisphere decomposition of ref. [1] in a manner that separates the dependence on the jet boundary from the observables measured inside the jet and beam regions. Results are given for several factorizable jet definitions, including anti- kT , XCone, and other geometric partitionings. We calculate explicitly the soft functions for angularity measurements, including jet massmore » and jet broadening, in pp → L + 1 jet and explore the differences for various jet vetoes and algorithms. This includes a consistent treatment of rapidity divergences when applicable. We also compute analytic results for these soft functions in an expansion for a small jet radius R. We find that the small- R results, including corrections up to O(R 2), accurately capture the full behavior over a large range of R.« less
Determining the Number of Clusters in a Data Set Without Graphical Interpretation
NASA Technical Reports Server (NTRS)
Aguirre, Nathan S.; Davies, Misty D.
2011-01-01
Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
Bhattacharya, Anindya; De, Rajat K
2010-08-01
Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.
Cluster beam targets for laser plasma extreme ultraviolet and soft x-ray sources
Kublak, G.D.; Richardson, M.C.
1996-11-19
Method and apparatus for producing extreme ultraviolet (EUV) and soft x-ray radiation from an ultra-low debris plasma source are disclosed. Targets are produced by the free jet expansion of various gases through a temperature controlled nozzle to form molecular clusters. These target clusters are subsequently irradiated with commercially available lasers of moderate intensity (10{sup 11}--10{sup 12} watts/cm{sup 2}) to produce a plasma radiating in the region of 0.5 to 100 nanometers. By appropriate adjustment of the experimental conditions the laser focus can be moved 10--30 mm from the nozzle thereby eliminating debris produced by plasma erosion of the nozzle. 5 figs.
Cluster beam targets for laser plasma extreme ultraviolet and soft x-ray sources
Kublak, Glenn D.; Richardson, Martin C. (CREOL
1996-01-01
Method and apparatus for producing extreme ultra violet (EUV) and soft x-ray radiation from an ultra-low debris plasma source are disclosed. Targets are produced by the free jet expansion of various gases through a temperature controlled nozzle to form molecular clusters. These target clusters are subsequently irradiated with commercially available lasers of moderate intensity (10.sup.11 -10.sup.12 watts/cm.sup.2) to produce a plasma radiating in the region of 0.5 to 100 nanometers. By appropriate adjustment of the experimental conditions the laser focus can be moved 10-30 mm from the nozzle thereby eliminating debris produced by plasma erosion of the nozzle.
Reducing the time requirement of k-means algorithm.
Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou
2012-01-01
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARI(HA)). We found that when k is close to d, the quality is good (ARI(HA)>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARI(HA)>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.
Reducing the Time Requirement of k-Means Algorithm
Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou
2012-01-01
Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data. PMID:23239974
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Block clustering based on difference of convex functions (DC) programming and DC algorithms.
Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai
2013-10-01
We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
Online clustering algorithms for radar emitter classification.
Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max
2005-08-01
Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.
Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B
2011-08-15
Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Research on the precise positioning of customers in large data environment
NASA Astrophysics Data System (ADS)
Zhou, Xu; He, Lili
2018-04-01
Customer positioning has always been a problem that enterprises focus on. In this paper, FCM clustering algorithm is used to cluster customer groups. However, due to the traditional FCM clustering algorithm, which is susceptible to the influence of the initial clustering center and easy to fall into the local optimal problem, the short board of FCM is solved by the gray optimization algorithm (GWO) to achieve efficient and accurate handling of a large number of retailer data.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-11-12
The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
Huang, Mingzhi; Wan, Jinquan; Hu, Kang; Ma, Yongwen; Wang, Yan
2013-12-01
An on-line hybrid fuzzy-neural soft-sensing model-based control system was developed to optimize dissolved oxygen concentration in a bench-scale anaerobic/anoxic/oxic (A(2)/O) process. In order to improve the performance of the control system, a self-adapted fuzzy c-means clustering algorithm and adaptive network-based fuzzy inference system (ANFIS) models were employed. The proposed control system permits the on-line implementation of every operating strategy of the experimental system. A set of experiments involving variable hydraulic retention time (HRT), influent pH (pH), dissolved oxygen in the aerobic reactor (DO), and mixed-liquid return ratio (r) was carried out. Using the proposed system, the amount of COD in the effluent stabilized at the set-point and below. The improvement was achieved with optimum dissolved oxygen concentration because the performance of the treatment process was optimized using operating rules implemented in real time. The system allows various expert operational approaches to be deployed with the goal of minimizing organic substances in the outlet while using the minimum amount of energy.
Cooperative network clustering and task allocation for heterogeneous small satellite network
NASA Astrophysics Data System (ADS)
Qin, Jing
The research of small satellite has emerged as a hot topic in recent years because of its economical prospects and convenience in launching and design. Due to the size and energy constraints of small satellites, forming a small satellite network(SSN) in which all the satellites cooperate with each other to finish tasks is an efficient and effective way to utilize them. In this dissertation, I designed and evaluated a weight based dominating set clustering algorithm, which efficiently organizes the satellites into stable clusters. The traditional clustering algorithms of large monolithic satellite networks, such as formation flying and satellite swarm, are often limited on automatic formation of clusters. Therefore, a novel Distributed Weight based Dominating Set(DWDS) clustering algorithm is designed to address the clustering problems in the stochastically deployed SSNs. Considering the unique features of small satellites, this algorithm is able to form the clusters efficiently and stably. In this algorithm, satellites are separated into different groups according to their spatial characteristics. A minimum dominating set is chosen as the candidate cluster head set based on their weights, which is a weighted combination of residual energy and connection degree. Then the cluster heads admit new neighbors that accept their invitations into the cluster, until the maximum cluster size is reached. Evaluated by the simulation results, in a SSN with 200 to 800 nodes, the algorithm is able to efficiently cluster more than 90% of nodes in 3 seconds. The Deadline Based Resource Balancing (DBRB) task allocation algorithm is designed for efficient task allocations in heterogeneous LEO small satellite networks. In the task allocation process, the dispatcher needs to consider the deadlines of the tasks as well as the residue energy of different resources for best energy utilization. We assume the tasks adopt a Map-Reduce framework, in which a task can consist of multiple subtasks. The DBRB algorithm is deployed on the head node of a cluster. It gathers the status from each cluster member and calculates their Node Importance Factors (NIFs) from the carried resources, residue power and compute capacity. The algorithm calculates the number of concurrent subtasks based on the deadlines, and allocates the subtasks to the nodes according to their NIF values. The simulation results show that when cluster members carry multiple resources, resource are more balanced and rare resources serve longer in DBRB than in the Earliest Deadline First algorithm. We also show that the algorithm performs well in service isolation by serving multiple tasks with different deadlines. Moreover, the average task response time with various cluster size settings is well controlled within deadlines as well. Except non-realtime tasks, small satellites may execute realtime tasks as well. The location-dependent tasks, such as image capturing, data transmission and remote sensing tasks are realtime tasks that are required to be started / finished on specific time. The resource energy balancing algorithm for realtime and non-realtime mixed workload is developed to efficiently schedule the tasks for best system performance. It calculates the residue energy for each resource type and tries to preserve resources and node availability when distributing tasks. Non-realtime tasks can be preempted by realtime tasks to provide better QoS to realtime tasks. I compared the performance of proposed algorithm with a random-priority scheduling algorithm, with only realtime tasks, non-realtime tasks and mixed tasks. It shows the resource energy reservation algorithm outperforms the latter one with both balanced and imbalanced workloads. Although the resource energy balancing task allocation algorithm for mixed workload provides preemption mechanism for realtime tasks, realtime tasks can still fail due to resource exhaustion. For LEO small satellite flies around the earth on stable orbits, the location-dependent realtime tasks can be considered as periodical tasks. Therefore, it is possible to reserve energy for these realtime tasks. The resource energy reservation algorithm preserves energy for the realtime tasks when the execution routine of periodical realtime tasks is known. In order to reserve energy for tasks starting very early in each period that the node does not have enough energy charged, an energy wrapping mechanism is also designed to calculate the residue energy from the previous period. The simulation results show that without energy reservation, realtime task failure rate can reach more than 60% when the workload is highly imbalanced. In contrast, the resource energy reservation produces zero RT task failures and leads to equal or better aggregate system throughput than the non-reservation algorithm. The proposed algorithm also preserves more energy because it avoids task preemption. (Abstract shortened by ProQuest.).
SoftLab: A Soft-Computing Software for Experimental Research with Commercialization Aspects
NASA Technical Reports Server (NTRS)
Akbarzadeh-T, M.-R.; Shaikh, T. S.; Ren, J.; Hubbell, Rob; Kumbla, K. K.; Jamshidi, M
1998-01-01
SoftLab is a software environment for research and development in intelligent modeling/control using soft-computing paradigms such as fuzzy logic, neural networks, genetic algorithms, and genetic programs. SoftLab addresses the inadequacies of the existing soft-computing software by supporting comprehensive multidisciplinary functionalities from management tools to engineering systems. Furthermore, the built-in features help the user process/analyze information more efficiently by a friendly yet powerful interface, and will allow the user to specify user-specific processing modules, hence adding to the standard configuration of the software environment.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
Luo, Ze; Baoping, Yan; Takekawa, John Y.; Prosser, Diann J.
2012-01-01
We propose a new method to help ornithologists and ecologists discover shared segments on the migratory pathway of the bar-headed geese by time-based plane-sweeping trajectory clustering. We present a density-based time parameterized line segment clustering algorithm, which extends traditional comparable clustering algorithms from temporal and spatial dimensions. We present a time-based plane-sweeping trajectory clustering algorithm to reveal the dynamic evolution of spatial-temporal object clusters and discover common motion patterns of bar-headed geese in the process of migration. Experiments are performed on GPS-based satellite telemetry data from bar-headed geese and results demonstrate our algorithms can correctly discover shared segments of the bar-headed geese migratory pathway. We also present findings on the migratory behavior of bar-headed geese determined from this new analytical approach.
Computational gene expression profiling under salt stress reveals patterns of co-expression
Sanchita; Sharma, Ashok
2016-01-01
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
A scalable and practical one-pass clustering algorithm for recommender system
NASA Astrophysics Data System (ADS)
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Li, Xiaofang; Xu, Lizhong; Wang, Huibin; Song, Jie; Yang, Simon X.
2010-01-01
The traditional Low Energy Adaptive Cluster Hierarchy (LEACH) routing protocol is a clustering-based protocol. The uneven selection of cluster heads results in premature death of cluster heads and premature blind nodes inside the clusters, thus reducing the overall lifetime of the network. With a full consideration of information on energy and distance distribution of neighboring nodes inside the clusters, this paper proposes a new routing algorithm based on differential evolution (DE) to improve the LEACH routing protocol. To meet the requirements of monitoring applications in outdoor environments such as the meteorological, hydrological and wetland ecological environments, the proposed algorithm uses the simple and fast search features of DE to optimize the multi-objective selection of cluster heads and prevent blind nodes for improved energy efficiency and system stability. Simulation results show that the proposed new LEACH routing algorithm has better performance, effectively extends the working lifetime of the system, and improves the quality of the wireless sensor networks. PMID:22219670
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
Ju, Chunhua; Xu, Chonghuan
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.
Ju, Chunhua
2013-01-01
Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms
He, Li; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions. PMID:29123546
A Case for Soft Error Detection and Correction in Computational Chemistry.
van Dam, Hubertus J J; Vishnu, Abhinav; de Jong, Wibe A
2013-09-10
High performance computing platforms are expected to deliver 10(18) floating operations per second by the year 2022 through the deployment of millions of cores. Even if every core is highly reliable the sheer number of them will mean that the mean time between failures will become so short that most application runs will suffer at least one fault. In particular soft errors caused by intermittent incorrect behavior of the hardware are a concern as they lead to silent data corruption. In this paper we investigate the impact of soft errors on optimization algorithms using Hartree-Fock as a particular example. Optimization algorithms iteratively reduce the error in the initial guess to reach the intended solution. Therefore they may intuitively appear to be resilient to soft errors. Our results show that this is true for soft errors of small magnitudes but not for large errors. We suggest error detection and correction mechanisms for different classes of data structures. The results obtained with these mechanisms indicate that we can correct more than 95% of the soft errors at moderate increases in the computational cost.
Putzer, David; Klug, Sebastian; Moctezuma, Jose Luis; Nogler, Michael
2014-12-01
Time-of-flight (TOF) cameras can guide surgical robots or provide soft tissue information for augmented reality in the medical field. In this study, a method to automatically track the soft tissue envelope of a minimally invasive hip approach in a cadaver study is described. An algorithm for the TOF camera was developed and 30 measurements on 8 surgical situs (direct anterior approach) were carried out. The results were compared to a manual measurement of the soft tissue envelope. The TOF camera showed an overall recognition rate of the soft tissue envelope of 75%. On comparing the results from the algorithm with the manual measurements, a significant difference was found (P > .005). In this preliminary study, we have presented a method for automatically recognizing the soft tissue envelope of the surgical field in a real-time application. Further improvements could result in a robotic navigation device for minimally invasive hip surgery. © The Author(s) 2014.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
Liu, L L; Liu, M J; Ma, M
2015-09-28
The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
A Self-Organizing Spatial Clustering Approach to Support Large-Scale Network RTK Systems.
Shen, Lili; Guo, Jiming; Wang, Lei
2018-06-06
The network real-time kinematic (RTK) technique can provide centimeter-level real time positioning solutions and play a key role in geo-spatial infrastructure. With ever-increasing popularity, network RTK systems will face issues in the support of large numbers of concurrent users. In the past, high-precision positioning services were oriented towards professionals and only supported a few concurrent users. Currently, precise positioning provides a spatial foundation for artificial intelligence (AI), and countless smart devices (autonomous cars, unmanned aerial-vehicles (UAVs), robotic equipment, etc.) require precise positioning services. Therefore, the development of approaches to support large-scale network RTK systems is urgent. In this study, we proposed a self-organizing spatial clustering (SOSC) approach which automatically clusters online users to reduce the computational load on the network RTK system server side. The experimental results indicate that both the SOSC algorithm and the grid algorithm can reduce the computational load efficiently, while the SOSC algorithm gives a more elastic and adaptive clustering solution with different datasets. The SOSC algorithm determines the cluster number and the mean distance to cluster center (MDTCC) according to the data set, while the grid approaches are all predefined. The side-effects of clustering algorithms on the user side are analyzed with real global navigation satellite system (GNSS) data sets. The experimental results indicate that 10 km can be safely used as the cluster radius threshold for the SOSC algorithm without significantly reducing the positioning precision and reliability on the user side.
NASA Astrophysics Data System (ADS)
Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo
2018-04-01
Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
A synthetic dataset for evaluating soft and hard fusion algorithms
NASA Astrophysics Data System (ADS)
Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey
2011-06-01
There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.
Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K
2003-11-01
Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Internal Cluster Validation on Earthquake Data in the Province of Bengkulu
NASA Astrophysics Data System (ADS)
Rini, D. S.; Novianti, P.; Fransiska, H.
2018-04-01
K-means method is an algorithm for cluster n object based on attribute to k partition, where k < n. There is a deficiency of algorithms that is before the algorithm is executed, k points are initialized randomly so that the resulting data clustering can be different. If the random value for initialization is not good, the clustering becomes less optimum. Cluster validation is a technique to determine the optimum cluster without knowing prior information from data. There are two types of cluster validation, which are internal cluster validation and external cluster validation. This study aims to examine and apply some internal cluster validation, including the Calinski-Harabasz (CH) Index, Sillhouette (S) Index, Davies-Bouldin (DB) Index, Dunn Index (D), and S-Dbw Index on earthquake data in the Bengkulu Province. The calculation result of optimum cluster based on internal cluster validation is CH index, S index, and S-Dbw index yield k = 2, DB Index with k = 6 and Index D with k = 15. Optimum cluster (k = 6) based on DB Index gives good results for clustering earthquake in the Bengkulu Province.
NASA Technical Reports Server (NTRS)
Mach, Douglas M.; Christian, Hugh J.; Blakeslee, Richard; Boccippio, Dennis J.; Goodman, Steve J.; Boeck, William
2006-01-01
We describe the clustering algorithm used by the Lightning Imaging Sensor (LIS) and the Optical Transient Detector (OTD) for combining the lightning pulse data into events, groups, flashes, and areas. Events are single pixels that exceed the LIS/OTD background level during a single frame (2 ms). Groups are clusters of events that occur within the same frame and in adjacent pixels. Flashes are clusters of groups that occur within 330 ms and either 5.5 km (for LIS) or 16.5 km (for OTD) of each other. Areas are clusters of flashes that occur within 16.5 km of each other. Many investigators are utilizing the LIS/OTD flash data; therefore, we test how variations in the algorithms for the event group and group-flash clustering affect the flash count for a subset of the LIS data. We divided the subset into areas with low (1-3), medium (4-15), high (16-63), and very high (64+) flashes to see how changes in the clustering parameters affect the flash rates in these different sizes of areas. We found that as long as the cluster parameters are within about a factor of two of the current values, the flash counts do not change by more than about 20%. Therefore, the flash clustering algorithm used by the LIS and OTD sensors create flash rates that are relatively insensitive to reasonable variations in the clustering algorithms.
NASA Astrophysics Data System (ADS)
Rahman, Md. Habibur; Matin, M. A.; Salma, Umma
2017-12-01
The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.
An adaptive clustering algorithm for image matching based on corner feature
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-04-01
The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-01-01
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices’ service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes’ life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN. PMID:26703619
Service-Aware Clustering: An Energy-Efficient Model for the Internet-of-Things.
Bagula, Antoine; Abidoye, Ademola Philip; Zodi, Guy-Alain Lusilao
2015-12-23
Current generation wireless sensor routing algorithms and protocols have been designed based on a myopic routing approach, where the motes are assumed to have the same sensing and communication capabilities. Myopic routing is not a natural fit for the IoT, as it may lead to energy imbalance and subsequent short-lived sensor networks, routing the sensor readings over the most service-intensive sensor nodes, while leaving the least active nodes idle. This paper revisits the issue of energy efficiency in sensor networks to propose a clustering model where sensor devices' service delivery is mapped into an energy awareness model, used to design a clustering algorithm that finds service-aware clustering (SAC) configurations in IoT settings. The performance evaluation reveals the relative energy efficiency of the proposed SAC algorithm compared to related routing algorithms in terms of energy consumption, the sensor nodes' life span and its traffic engineering efficiency in terms of throughput and delay. These include the well-known low energy adaptive clustering hierarchy (LEACH) and LEACH-centralized (LEACH-C) algorithms, as well as the most recent algorithms, such as DECSA and MOCRN.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.
Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth
2015-05-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
Optimization and real-time control for laser treatment of heterogeneous soft tissues.
Feng, Yusheng; Fuentes, David; Hawkins, Andrea; Bass, Jon M; Rylander, Marissa Nichole
2009-01-01
Predicting the outcome of thermotherapies in cancer treatment requires an accurate characterization of the bioheat transfer processes in soft tissues. Due to the biological and structural complexity of tumor (soft tissue) composition and vasculature, it is often very difficult to obtain reliable tissue properties that is one of the key factors for the accurate treatment outcome prediction. Efficient algorithms employing in vivo thermal measurements to determine heterogeneous thermal tissues properties in conjunction with a detailed sensitivity analysis can produce essential information for model development and optimal control. The goals of this paper are to present a general formulation of the bioheat transfer equation for heterogeneous soft tissues, review models and algorithms developed for cell damage, heat shock proteins, and soft tissues with nanoparticle inclusion, and demonstrate an overall computational strategy for developing a laser treatment framework with the ability to perform real-time robust calibrations and optimal control. This computational strategy can be applied to other thermotherapies using the heat source such as radio frequency or high intensity focused ultrasound.
Mustapha, Ibrahim; Ali, Borhanuddin Mohd; Rasid, Mohd Fadlee A.; Sali, Aduwati; Mohamad, Hafizal
2015-01-01
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach. PMID:26287191
Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal
2015-08-13
It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun
2017-03-23
The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.
A Modified MinMax k-Means Algorithm Based on PSO.
Wang, Xiaoyan; Bai, Yanping
The MinMax k -means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k -means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k -means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k -means algorithm and the original MinMax k -means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically.
Testing a Firefly-Inspired Synchronization Algorithm in a Complex Wireless Sensor Network
Hao, Chuangbo; Song, Ping; Yang, Cheng; Liu, Xiongjun
2017-01-01
Data acquisition is the foundation of soft sensor and data fusion. Distributed data acquisition and its synchronization are the important technologies to ensure the accuracy of soft sensors. As a research topic in bionic science, the firefly-inspired algorithm has attracted widespread attention as a new synchronization method. Aiming at reducing the design difficulty of firefly-inspired synchronization algorithms for Wireless Sensor Networks (WSNs) with complex topologies, this paper presents a firefly-inspired synchronization algorithm based on a multiscale discrete phase model that can optimize the performance tradeoff between the network scalability and synchronization capability in a complex wireless sensor network. The synchronization process can be regarded as a Markov state transition, which ensures the stability of this algorithm. Compared with the Miroll and Steven model and Reachback Firefly Algorithm, the proposed algorithm obtains better stability and performance. Finally, its practicality has been experimentally confirmed using 30 nodes in a real multi-hop topology with low quality links. PMID:28282899
Testing a Firefly-Inspired Synchronization Algorithm in a Complex Wireless Sensor Network.
Hao, Chuangbo; Song, Ping; Yang, Cheng; Liu, Xiongjun
2017-03-08
Data acquisition is the foundation of soft sensor and data fusion. Distributed data acquisition and its synchronization are the important technologies to ensure the accuracy of soft sensors. As a research topic in bionic science, the firefly-inspired algorithm has attracted widespread attention as a new synchronization method. Aiming at reducing the design difficulty of firefly-inspired synchronization algorithms for Wireless Sensor Networks (WSNs) with complex topologies, this paper presents a firefly-inspired synchronization algorithm based on a multiscale discrete phase model that can optimize the performance tradeoff between the network scalability and synchronization capability in a complex wireless sensor network. The synchronization process can be regarded as a Markov state transition, which ensures the stability of this algorithm. Compared with the Miroll and Steven model and Reachback Firefly Algorithm, the proposed algorithm obtains better stability and performance. Finally, its practicality has been experimentally confirmed using 30 nodes in a real multi-hop topology with low quality links.
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
Self-organization and clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Reconstruction of a digital core containing clay minerals based on a clustering algorithm.
He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling
2017-10-01
It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
NASA Astrophysics Data System (ADS)
Bonamente, Massimiliano; Nevalainen, Jukka
2011-09-01
We present spatially resolved spectroscopy of the galaxy cluster AS1101, also known as Sèrsic 159-03, with Chandra, XMM-Newton, and ROSAT, and investigate the presence of soft X-ray excess emission above the contribution from the hot intracluster medium. In earlier papers we reported an extremely bright soft excess component that reached 100% of the thermal radiation in the R2 ROSAT band (0.2-0.4 keV), using the H I column density measurement by Dickey and Lockman. In this paper we use the newer Leiden-Argentine-Bonn survey measurements of the H I column density toward AS1101, significantly lower than the previous value, and show that the soft excess emission in AS1101 is now at the level of 10%-20% of the hot gas emission, in line with those of a large sample of clusters analyzed by Bonamente et al. in 2002. The ROSAT soft excess emission is detected regardless of calibration uncertainties between Chandra and XMM-Newton. This new analysis of AS1101 indicates that the 1/4 keV band emission is compatible with the presence of warm-hot intergalactic medium (WHIM) filaments connected to the cluster and extending outward into the intergalactic medium; the temperatures we find in this study are typically lower than those of the WHIM probed in other X-ray studies. We also show that the soft excess emission is compatible with a non-thermal origin as the inverse Compton scattering of relativistic electrons off the cosmic microwave background, with pressure less than 1% of the thermal electrons.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China
Zou, Hui; Zou, Zhihong; Wang, Xiaojing
2015-01-01
The increase and the complexity of data caused by the uncertain environment is today’s reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006–2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality. PMID:26569283
Growth of Ni nanoclusters on irradiated graphene: a molecular dynamics study.
Valencia, F J; Hernandez-Vazquez, E E; Bringa, E M; Moran-Lopez, J L; Rogan, J; Gonzalez, R I; Munoz, F
2018-04-23
We studied the soft landing of Ni atoms on a previously damaged graphene sheet by means of molecular dynamics simulations. We found a monotonic decrease of the cluster frequency as a function of its size, but few big clusters comprise an appreciable fraction of the total number of Ni atoms. The aggregation of Ni atoms is also modeled by means of a simple phenomenological model. The results are in clear contrast with the case of hard or energetic landing of metal atoms, where there is a tendency to form mono-disperse metal clusters. This behavior is attributed to the high diffusion of unattached Ni atoms, together with vacancies acting as capture centers. The findings of this work show that a simple study of the energetics of the system is not enough in the soft landing regime, where it is unavoidable to also consider the growth process of metal clusters.
Eyler, Lauren; Hubbard, Alan; Juillard, Catherine
2016-10-01
Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
NASA Astrophysics Data System (ADS)
Ebrahimi, A.; Pahlavani, P.; Masoumi, Z.
2017-09-01
Traffic monitoring and managing in urban intelligent transportation systems (ITS) can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC) for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs); moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH), and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM) and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.
NASA Astrophysics Data System (ADS)
Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Detection of protein complex from protein-protein interaction network using Markov clustering
NASA Astrophysics Data System (ADS)
Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.
2017-05-01
Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Zhou, Jianyong; Luo, Zu; Li, Chunquan; Deng, Mi
2018-01-01
When the meshless method is used to establish the mathematical-mechanical model of human soft tissues, it is necessary to define the space occupied by human tissues as the problem domain and the boundary of the domain as the surface of those tissues. Nodes should be distributed in both the problem domain and on the boundaries. Under external force, the displacement of the node is computed by the meshless method to represent the deformation of biological soft tissues. However, computation by the meshless method consumes too much time, which will affect the simulation of real-time deformation of human tissues in virtual surgery. In this article, the Marquardt's Algorithm is proposed to fit the nodal displacement at the problem domain's boundary and obtain the relationship between surface deformation and force. When different external forces are applied, the deformation of soft tissues can be quickly obtained based on this relationship. The analysis and discussion show that the improved model equations with Marquardt's Algorithm not only can simulate the deformation in real-time but also preserve the authenticity of the deformation model's physical properties. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kumar, Rohit; Puri, Rajeev K.
2018-03-01
Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
Optimized data fusion for K-means Laplacian clustering
Yu, Shi; Liu, Xinhai; Tranchevent, Léon-Charles; Glänzel, Wolfgang; Suykens, Johan A. K.; De Moor, Bart; Moreau, Yves
2011-01-01
Motivation: We propose a novel algorithm to combine multiple kernels and Laplacians for clustering analysis. The new algorithm is formulated on a Rayleigh quotient objective function and is solved as a bi-level alternating minimization procedure. Using the proposed algorithm, the coefficients of kernels and Laplacians can be optimized automatically. Results: Three variants of the algorithm are proposed. The performance is systematically validated on two real-life data fusion applications. The proposed Optimized Kernel Laplacian Clustering (OKLC) algorithms perform significantly better than other methods. Moreover, the coefficients of kernels and Laplacians optimized by OKLC show some correlation with the rank of performance of individual data source. Though in our evaluation the K values are predefined, in practical studies, the optimal cluster number can be consistently estimated from the eigenspectrum of the combined kernel Laplacian matrix. Availability: The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/oklc.html. Contact: shiyu@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20980271
Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.
Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A
2012-02-01
Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method.
An Interesting Review on Soft Skills and Dental Practice
Ishaquddin, Syed; Ghadage, Mahesh; Hatte, Geeta
2015-01-01
In today’s world of education, we concentrate on teaching activities and academic knowledge. We are taught to improve our clinical skills. Soft skills refer to the cluster of personality traits, social graces, and personal habits, facility with language, friendliness and personal habits that mark people to varying degrees. Soft Skills are interpersonal, psychological, self-promoted and non-technical qualities for every practitioner and academician, whereas hard skills are new tools or equipment and professional knowledge. Hence, more and more clinicians now days consider soft skills as important job criteria. An increase in service industry and competitive practices emphasizes the need for soft skills. Soft Skills are very important and useful in personal and professional life. PMID:25954720
An interesting review on soft skills and dental practice.
Dalaya, Maya; Ishaquddin, Syed; Ghadage, Mahesh; Hatte, Geeta
2015-03-01
In today's world of education, we concentrate on teaching activities and academic knowledge. We are taught to improve our clinical skills. Soft skills refer to the cluster of personality traits, social graces, and personal habits, facility with language, friendliness and personal habits that mark people to varying degrees. Soft Skills are interpersonal, psychological, self-promoted and non-technical qualities for every practitioner and academician, whereas hard skills are new tools or equipment and professional knowledge. Hence, more and more clinicians now days consider soft skills as important job criteria. An increase in service industry and competitive practices emphasizes the need for soft skills. Soft Skills are very important and useful in personal and professional life.
A hybrid algorithm for clustering of time series data based on affinity search technique.
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang
2017-10-30
In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.
A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique
Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza
2014-01-01
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
NASA Astrophysics Data System (ADS)
Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna
2017-12-01
The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.
Hebbian self-organizing integrate-and-fire networks for data clustering.
Landis, Florian; Ott, Thomas; Stoop, Ruedi
2010-01-01
We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.
Impact of heuristics in clustering large biological networks.
Shafin, Md Kishwar; Kabir, Kazi Lutful; Ridwan, Iffatur; Anannya, Tasmiah Tamzid; Karim, Rashid Saadman; Hoque, Mohammad Mozammel; Rahman, M Sohel
2015-12-01
Traditional clustering algorithms often exhibit poor performance for large networks. On the contrary, greedy algorithms are found to be relatively efficient while uncovering functional modules from large biological networks. The quality of the clusters produced by these greedy techniques largely depends on the underlying heuristics employed. Different heuristics based on different attributes and properties perform differently in terms of the quality of the clusters produced. This motivates us to design new heuristics for clustering large networks. In this paper, we have proposed two new heuristics and analyzed the performance thereof after incorporating those with three different combinations in a recently celebrated greedy clustering algorithm named SPICi. We have extensively analyzed the effectiveness of these new variants. The results are found to be promising. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui
2017-02-06
A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.
Optimization of wireless sensor networks based on chicken swarm optimization algorithm
NASA Astrophysics Data System (ADS)
Wang, Qingxi; Zhu, Lihua
2017-05-01
In order to reduce the energy consumption of wireless sensor network and improve the survival time of network, the clustering routing protocol of wireless sensor networks based on chicken swarm optimization algorithm was proposed. On the basis of LEACH agreement, it was improved and perfected that the points on the cluster and the selection of cluster head using the chicken group optimization algorithm, and update the location of chicken which fall into the local optimum by Levy flight, enhance population diversity, ensure the global search capability of the algorithm. The new protocol avoided the die of partial node of intensive using by making balanced use of the network nodes, improved the survival time of wireless sensor network. The simulation experiments proved that the protocol is better than LEACH protocol on energy consumption, also is better than that of clustering routing protocol based on particle swarm optimization algorithm.
Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network
NASA Astrophysics Data System (ADS)
Wang, Zhen-yu; Zhang, Li-jie
2017-10-01
Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s
Gemperline, Paul J; Cash, Eric
2003-08-15
A new algorithm for self-modeling curve resolution (SMCR) that yields improved results by incorporating soft constraints is described. The method uses least squares penalty functions to implement constraints in an alternating least squares algorithm, including nonnegativity, unimodality, equality, and closure constraints. By using least squares penalty functions, soft constraints are formulated rather than hard constraints. Significant benefits are (obtained using soft constraints, especially in the form of fewer distortions due to noise in resolved profiles. Soft equality constraints can also be used to introduce incomplete or partial reference information into SMCR solutions. Four different examples demonstrating application of the new method are presented, including resolution of overlapped HPLC-DAD peaks, flow injection analysis data, and batch reaction data measured by UV/visible and near-infrared spectroscopy (NIR). Each example was selected to show one aspect of the significant advantages of soft constraints over traditionally used hard constraints. Incomplete or partial reference information into self-modeling curve resolution models is described. The method offers a substantial improvement in the ability to resolve time-dependent concentration profiles from mixture spectra recorded as a function of time.
A Modified MinMax k-Means Algorithm Based on PSO
2016-01-01
The MinMax k-means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMax k-means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMax k-means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to the k-means algorithm and the original MinMax k-means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically. PMID:27656201
NASA Astrophysics Data System (ADS)
Ma, Xiaoke; Wang, Bingbo; Yu, Liang
2018-01-01
Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.
Beverage consumption patterns of Canadian adults aged 19 to 65 years.
Nikpartow, Nooshin; Danyliw, Adrienne D; Whiting, Susan J; Lim, Hyun J; Vatanparast, Hassanali
2012-12-01
To investigate the beverage intake patterns of Canadian adults and explore characteristics of participants in different beverage clusters. Analyses of nationally representative data with cross-sectional complex stratified design. Canadian Community Health Survey, Cycle 2.2 (2004). A total of 14 277 participants aged 19-65 years, in whom dietary intake was assessed using a single 24 h recall, were included in the study. After determining total intake and the contribution of beverages to total energy intake among age/sex groups, cluster analysis (K-means method) was used to classify males and females into distinct clusters based on the dominant pattern of beverage intakes. To test differences across clusters, χ2 tests and 95 % confidence intervals of the mean intakes were used. Six beverage clusters in women and seven beverage clusters in men were identified. 'Sugar-sweetened' beverage clusters - regular soft drinks and fruit drinks - as well as a 'beer' cluster, appeared for both men and women. No 'milk' cluster appeared among women. The mean consumption of the dominant beverage in each cluster was higher among men than women. The 'soft drink' cluster in men had the lowest proportion of the higher levels of education, and in women the highest proportion of inactivity, compared with other beverage clusters. Patterns of beverage intake in Canadian women indicate high consumption of sugar-sweetened beverages particularly fruit drinks, low intake of milk and high intake of beer. These patterns in women have implications for poor bone health, risk of obesity and other morbidities.
Dynamic soft variable structure control of singular systems
NASA Astrophysics Data System (ADS)
Liu, Yunlong; Zhang, Caihong; Gao, Cunchen
2012-08-01
The dynamic soft variable structure control (VSC) of singular systems is discussed in this paper. The definition of soft VSC and the design of its controller modes are given. The stability of singular systems with the dynamic soft VSC is proposed. The dynamic soft variable structure controller is designed, and the concrete algorithm on the dynamic soft VSC is given. The dynamic soft VSC of singular systems which was developed for the purpose of intentionally precluding chattering, achieving high regulation rates and shortening settling times enhanced the dynamic quality of the systems. It is illustrated the feasibility and validity of the proposed strategy by a simulation example, and an outlook on its auspicious further development is presented.
A Comparative Evaluation of Anomaly Detection Algorithms for Maritime Video Surveillance
2011-01-01
of k-means clustering and the k- NN Localized p-value Estimator ( KNN -LPE). K-means is a popular distance-based clustering algorithm while KNN -LPE...implemented the sparse cluster identification rule we described in Section 3.1. 2. k-NN Localized p-value Estimator ( KNN -LPE): We implemented this using...Average Density ( KNN -NAD): This was implemented as described in Section 3.4. Algorithm Parameter Settings The global and local density-based anomaly
NASA Astrophysics Data System (ADS)
Chang, Bingguo; Chen, Xiaofei
2018-05-01
Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.
Jiang, Peng; Xu, Yiming; Wu, Feng
2016-01-01
Existing move-restricted node self-deployment algorithms are based on a fixed node communication radius, evaluate the performance based on network coverage or the connectivity rate and do not consider the number of nodes near the sink node and the energy consumption distribution of the network topology, thereby degrading network reliability and the energy consumption balance. Therefore, we propose a distributed underwater node self-deployment algorithm. First, each node begins the uneven clustering based on the distance on the water surface. Each cluster head node selects its next-hop node to synchronously construct a connected path to the sink node. Second, the cluster head node adjusts its depth while maintaining the layout formed by the uneven clustering and then adjusts the positions of in-cluster nodes. The algorithm originally considers the network reliability and energy consumption balance during node deployment and considers the coverage redundancy rate of all positions that a node may reach during the node position adjustment. Simulation results show, compared to the connected dominating set (CDS) based depth computation algorithm, that the proposed algorithm can increase the number of the nodes near the sink node and improve network reliability while guaranteeing the network connectivity rate. Moreover, it can balance energy consumption during network operation, further improve network coverage rate and reduce energy consumption. PMID:26784193
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
A soft-hard combination-based cooperative spectrum sensing scheme for cognitive radio networks.
Do, Nhu Tri; An, Beongku
2015-02-13
In this paper we propose a soft-hard combination scheme, called SHC scheme, for cooperative spectrum sensing in cognitive radio networks. The SHC scheme deploys a cluster based network in which Likelihood Ratio Test (LRT)-based soft combination is applied at each cluster, and weighted decision fusion rule-based hard combination is utilized at the fusion center. The novelties of the SHC scheme are as follows: the structure of the SHC scheme reduces the complexity of cooperative detection which is an inherent limitation of soft combination schemes. By using the LRT, we can detect primary signals in a low signal-to-noise ratio regime (around an average of -15 dB). In addition, the computational complexity of the LRT is reduced since we derive the closed-form expression of the probability density function of LRT value. The SHC scheme also takes into account the different effects of large scale fading on different users in the wide area network. The simulation results show that the SHC scheme not only provides the better sensing performance compared to the conventional hard combination schemes, but also reduces sensing overhead in terms of reporting time compared to the conventional soft combination scheme using the LRT.
Support vector machine firefly algorithm based optimization of lens system.
Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah
2015-01-01
Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.
The lucky image-motion prediction for simple scene observation based soft-sensor technology
NASA Astrophysics Data System (ADS)
Li, Yan; Su, Yun; Hu, Bin
2015-08-01
High resolution is important to earth remote sensors, while the vibration of the platforms of the remote sensors is a major factor restricting high resolution imaging. The image-motion prediction and real-time compensation are key technologies to solve this problem. For the reason that the traditional autocorrelation image algorithm cannot meet the demand for the simple scene image stabilization, this paper proposes to utilize soft-sensor technology in image-motion prediction, and focus on the research of algorithm optimization in imaging image-motion prediction. Simulations results indicate that the improving lucky image-motion stabilization algorithm combining the Back Propagation Network (BP NN) and support vector machine (SVM) is the most suitable for the simple scene image stabilization. The relative error of the image-motion prediction based the soft-sensor technology is below 5%, the training computing speed of the mathematical predication model is as fast as the real-time image stabilization in aerial photography.
Gharghan, Sadik Kamel; Nordin, Rosdiadee; Ismail, Mahamod
2016-08-06
In this paper, we propose two soft computing localization techniques for wireless sensor networks (WSNs). The two techniques, Neural Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN), focus on a range-based localization method which relies on the measurement of the received signal strength indicator (RSSI) from the three ZigBee anchor nodes distributed throughout the track cycling field. The soft computing techniques aim to estimate the distance between bicycles moving on the cycle track for outdoor and indoor velodromes. In the first approach the ANFIS was considered, whereas in the second approach the ANN was hybridized individually with three optimization algorithms, namely Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking Search Algorithm (BSA). The results revealed that the hybrid GSA-ANN outperforms the other methods adopted in this paper in terms of accuracy localization and distance estimation accuracy. The hybrid GSA-ANN achieves a mean absolute distance estimation error of 0.02 m and 0.2 m for outdoor and indoor velodromes, respectively.
A Wireless Sensor Network with Soft Computing Localization Techniques for Track Cycling Applications
Gharghan, Sadik Kamel; Nordin, Rosdiadee; Ismail, Mahamod
2016-01-01
In this paper, we propose two soft computing localization techniques for wireless sensor networks (WSNs). The two techniques, Neural Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN), focus on a range-based localization method which relies on the measurement of the received signal strength indicator (RSSI) from the three ZigBee anchor nodes distributed throughout the track cycling field. The soft computing techniques aim to estimate the distance between bicycles moving on the cycle track for outdoor and indoor velodromes. In the first approach the ANFIS was considered, whereas in the second approach the ANN was hybridized individually with three optimization algorithms, namely Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), and Backtracking Search Algorithm (BSA). The results revealed that the hybrid GSA-ANN outperforms the other methods adopted in this paper in terms of accuracy localization and distance estimation accuracy. The hybrid GSA-ANN achieves a mean absolute distance estimation error of 0.02 m and 0.2 m for outdoor and indoor velodromes, respectively. PMID:27509495
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters
NASA Astrophysics Data System (ADS)
Mahmoud, E.; Shoukry, A.; Takey, A.
2018-04-01
Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.
Improving clustering with metabolic pathway data.
Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando
2014-04-10
It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
NASA Astrophysics Data System (ADS)
Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing
2018-01-01
For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.
A similarity based agglomerative clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong
2018-04-01
The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.
Segmentation of bone pixels from EROI Image using clustering method for bone age assessment
NASA Astrophysics Data System (ADS)
Bakthula, Rajitha; Agarwal, Suneeta
2016-03-01
The bone age of a human can be identified using carpal and epiphysis bones ossification, which is limited to teen age. The accurate age estimation depends on best separation of bone pixels and soft tissue pixels in the ROI image. The traditional approaches like canny, sobel, clustering, region growing and watershed can be applied, but these methods requires proper pre-processing and accurate initial seed point estimation to provide accurate results. Therefore this paper proposes new approach to segment the bone from soft tissue and background pixels. First pixels are enhanced using BPE and the edges are identified by HIPI. Later a K-Means clustering is applied for segmentation. The performance of the proposed approach has been evaluated and compared with the existing methods.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Clustering approaches to identifying gene expression patterns from DNA microarray data.
Do, Jin Hwan; Choi, Dong-Kug
2008-04-30
The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
Identification of chronic rhinosinusitis phenotypes using cluster analysis.
Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J
2015-05-01
Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.
Quantum annealing for combinatorial clustering
NASA Astrophysics Data System (ADS)
Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph
2018-02-01
Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel; ...
2016-06-01
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco
2018-06-01
We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
PCA based clustering for brain tumor segmentation of T1w MRI images.
Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay
2017-03-01
Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Weighted community detection and data clustering using message passing
NASA Astrophysics Data System (ADS)
Shi, Cheng; Liu, Yanchen; Zhang, Pan
2018-03-01
Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.
Automatic detection of erythemato-squamous diseases using k-means clustering.
Ubeyli, Elif Derya; Doğdu, Erdoğan
2010-04-01
A new approach based on the implementation of k-means clustering is presented for automated detection of erythemato-squamous diseases. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. The studied domain contained records of patients with known diagnosis. The k-means clustering algorithm's task was to classify the data points, in this case the patients with attribute data, to one of the five clusters. The algorithm was used to detect the five erythemato-squamous diseases when 33 features defining five disease indications were used. The purpose is to determine an optimum classification scheme for this problem. The present research demonstrated that the features well represent the erythemato-squamous diseases and the k-means clustering algorithm's task achieved high classification accuracies for only five erythemato-squamous diseases.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-07-01
UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
Gaur, Pallavi; Chaturvedi, Anoop
2017-07-22
The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D; Sebastiani, Daniel
2012-11-21
We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
NASA Astrophysics Data System (ADS)
Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D.; Sebastiani, Daniel
2012-11-01
We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
An adaptive tracker for ShipIR/NTCS
NASA Astrophysics Data System (ADS)
Ramaswamy, Srinivasan; Vaitekunas, David A.
2015-05-01
A key component in any image-based tracking system is the adaptive tracking algorithm used to segment the image into potential targets, rank-and-select the best candidate target, and the gating of the selected target to further improve tracker performance. This paper will describe a new adaptive tracker algorithm added to the naval threat countermeasure simulator (NTCS) of the NATO-standard ship signature model (ShipIR). The new adaptive tracking algorithm is an optional feature used with any of the existing internal NTCS or user-defined seeker algorithms (e.g., binary centroid, intensity centroid, and threshold intensity centroid). The algorithm segments the detected pixels into clusters, and the smallest set of clusters that meet the detection criterion is obtained by using a knapsack algorithm to identify the set of clusters that should not be used. The rectangular area containing the chosen clusters defines an inner boundary, from which a weighted centroid is calculated as the aim-point. A track-gate is then positioned around the clusters, taking into account the rate of change of the bounding area and compensating for any gimbal displacement. A sequence of scenarios is used to test the new tracking algorithm on a generic unclassified DDG ShipIR model, with and without flares, and demonstrate how some of the key seeker signals are impacted by both the ship and flare intrinsic signatures.
Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen
2010-01-01
The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
Mathematical Fundamentals of Probabilistic Semantics for High-Level Fusion
2013-12-02
understanding of the fundamental aspects of uncertainty representation and reasoning that a theory of hard and soft high-level fusion must encompass...representation and reasoning that a theory of hard and soft high-level fusion must encompass. Successful completion requires an unbiased, in-depth...and soft information is the lack of a fundamental HLIF theory , backed by a consistent mathematical framework and supporting algorithms. Although there
Mining subspace clusters from DNA microarray data using large itemset techniques.
Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi
2009-05-01
Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
NASA Astrophysics Data System (ADS)
Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok
2015-01-01
This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.
NASA Astrophysics Data System (ADS)
Connor, H. K.; Carter, J. A.
2017-12-01
Soft X-rays can be emitted when highly charged solar wind ions and exospheric neutrals exchange electrons. Astrophysics missions, such as XMM-Newton and ROSAT X-ray telescopes, have found that such solar wind charge exchange happens at the Earth's exosphere. The Earth's magnetosphere can be imaged via soft X-rays in order to understand its interaction with solar wind. Consequently, two soft X-ray telescope missions (CuPID and SMILE) are scheduled to launch in 2019 and 2021. They will provide wide field-of-view soft X-ray images of the Earth's dayside magnetosphere. The imagers will track the location and movement of the cusps, magnetopause, and bow shock in response to solar wind variations. To support these missions, an understanding of exospheric neutral density profile is needed. The neutral density is one of the controlling factors of soft X-ray signals. Strong neutral density can help to obtain high-resolution and high-cadence of soft X-ray images. In this study, we estimate the exospheric neutral density at 10 RE subsolar point using XMM X-ray observations, Cluster plasma observations, and OpenGGCM global magnetosphere - ionosphere MHD model. XMM-Newton observes line-of-sight, narrow field-of-view, integrated soft X-ray emissions when it looks through the dayside magnetosphere. OpenGGCM reproduces soft X-ray signals seen by the XMM spacecraft, assuming exospheric neutral density as a function of the neutral density at the 10RE subsolar point and the radial distance. Cluster observations are used to confirm OpenGGCM plasma results. Finally, we deduce the neutral density at 10 RE subsolar point by adjusting the model results to the XMM-Newton soft X-ray observations.
Blocked inverted indices for exact clustering of large chemical spaces.
Thiel, Philipp; Sach-Peltason, Lisa; Ottmann, Christian; Kohlbacher, Oliver
2014-09-22
The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of the utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given data set. In contrast to other algorithms, it neither requires GPU computing nor yields a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm, we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-01-01
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency. PMID:26907272
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
Real Time Intelligent Target Detection and Analysis with Machine Vision
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Padgett, Curtis; Brown, Kenneth
2000-01-01
We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.
Computer algorithm for coding gain
NASA Technical Reports Server (NTRS)
Dodd, E. E.
1974-01-01
Development of a computer algorithm for coding gain for use in an automated communications link design system. Using an empirical formula which defines coding gain as used in space communications engineering, an algorithm is constructed on the basis of available performance data for nonsystematic convolutional encoding with soft-decision (eight-level) Viterbi decoding.
A Deep Near-Infrared Survey of the N 49 Region around the Soft Gamma-Ray Repeater 0526-66
NASA Technical Reports Server (NTRS)
Klose, S.; Henden, A. A.; Geppert, U.; Greiner, J.; Guetter, H. H.; Hartmann, D. H.; Kouveliotou, C.; Luginbuhl, C. B.; Stecklurn, B.; Vrba, F. J.
2004-01-01
We report the results of a deep near-infrared survey of the vicinity of supernova remnant N49 in the Large Magellanic Cloud (LMC), which contains the soft gamma-ray repeater (SGR) 0526-66. Two of the four confirmed SGRs are potentially associated with compact stellar clusters. We thus searched for a similar association of SGR0526-66, and find the unexplored young stellar cluster SL 463 at a projected distance of approx. 30 pc from the SGR. This constitutes the third cluster-SGR link, and lends support to scenarios in which SGR progenitors originate in young, embedded clusters. If real, the cluster-SGR association constrains the age and thus the initial mass of these stars. In addition, our high-resolution images of the super- nova remnant N49 reveal an area of excess K-band flux in the southeastern part of the SNR. This feature coincides with the maximum flux area at 8.28 microns as detected by the Midcourse Space Experiment (MSX satellite), which we identify with IRAS 052594607.
A Massive Warm Baryonic Halo in the Coma Cluster
NASA Technical Reports Server (NTRS)
Bonamente, Massimiliano; Joy, Marshall K.; Lieu, Richard
2003-01-01
Several deep PSPC observations of the Coma Cluster reveal a very large scale halo of soft X-ray emission, substantially in excess of the well-known radiation from the hot intracluster medium. The excess emission, previously reported in the central region of the cluster using lower sensitivity Extreme Ultraviolet Explorer (EUVE) and ROSAT data, is now evident out to a radius of 2.6 Mpc, demonstrating that the soft excess radiation from clusters is a phenomenon of cosmological significance. The X-ray spectrum at these large radii cannot be modeled nonthermally but is consistent with the original scenario of thermal emission from warm gas at approx. 10(exp 6) K. The mass of the warm gas is on par with that of the hot X-ray-emitting plasma and significantly more massive if the warm gas resides in low-density filamentary structures. Thus, the data lend vital support to current theories of cosmic evolution, which predict that at low redshift approx. 30%-40% of the baryons reside in warm filaments converging at clusters of galaxies.
Adaptive block online learning target tracking based on super pixel segmentation
NASA Astrophysics Data System (ADS)
Cheng, Yue; Li, Jianzeng
2018-04-01
Video target tracking technology under the unremitting exploration of predecessors has made big progress, but there are still lots of problems not solved. This paper proposed a new algorithm of target tracking based on image segmentation technology. Firstly we divide the selected region using simple linear iterative clustering (SLIC) algorithm, after that, we block the area with the improved density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. Each sub-block independently trained classifier and tracked, then the algorithm ignore the failed tracking sub-block while reintegrate the rest of the sub-blocks into tracking box to complete the target tracking. The experimental results show that our algorithm can work effectively under occlusion interference, rotation change, scale change and many other problems in target tracking compared with the current mainstream algorithms.
Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V
2003-01-01
In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
NASA Technical Reports Server (NTRS)
Wharton, S. W.
1980-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.
Multimodal Estimation of Distribution Algorithms.
Yang, Qiang; Chen, Wei-Neng; Li, Yun; Chen, C L Philip; Xu, Xiang-Min; Zhang, Jun
2016-02-15
Taking the advantage of estimation of distribution algorithms (EDAs) in preserving high diversity, this paper proposes a multimodal EDA. Integrated with clustering strategies for crowding and speciation, two versions of this algorithm are developed, which operate at the niche level. Then these two algorithms are equipped with three distinctive techniques: 1) a dynamic cluster sizing strategy; 2) an alternative utilization of Gaussian and Cauchy distributions to generate offspring; and 3) an adaptive local search. The dynamic cluster sizing affords a potential balance between exploration and exploitation and reduces the sensitivity to the cluster size in the niching methods. Taking advantages of Gaussian and Cauchy distributions, we generate the offspring at the niche level through alternatively using these two distributions. Such utilization can also potentially offer a balance between exploration and exploitation. Further, solution accuracy is enhanced through a new local search scheme probabilistically conducted around seeds of niches with probabilities determined self-adaptively according to fitness values of these seeds. Extensive experiments conducted on 20 benchmark multimodal problems confirm that both algorithms can achieve competitive performance compared with several state-of-the-art multimodal algorithms, which is supported by nonparametric tests. Especially, the proposed algorithms are very promising for complex problems with many local optima.
NASA Astrophysics Data System (ADS)
Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan
2017-12-01
Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.
A ground truth based comparative study on clustering of gene expression data.
Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue
2008-05-01
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
Clustering Millions of Faces by Identity.
Otto, Charles; Wang, Dayong; Jain, Anil K
2018-02-01
Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
X-ray spectral observations of clusters of galaxies undergoing merger events
NASA Astrophysics Data System (ADS)
Henriksen, Mark J.
1993-09-01
We have analyzed the HEAO 1 A2 observations of two clusters whose optical and X-ray isophotes are suggestive of merging subclusters, A119 and A754, and find evidence of nonisothermal X-ray emission from both clusters. The X-ray spectrum of both clusters, when fitted with a single isothermal model, shows residual soft X-ray emission. There is a statistically significant reduction in chi-squared (98 percent probability based on the F-test) when a second temperature component is added. If the asymmetric isophotes seen in the soft X-ray image are indicative of merging subclusters, then our analysis of the Einstein IPC spectra and Solid State Spectrometer observations of A754, which provide some spatial and spectral resolution, suggests that the two temperature components seen in the HEAO 1 A2 spectra are associated with gas trapped in the subcluster potential wells. The implied subcluster isothermal masses suggest that a more massive cluster is accreting a less massive companion in A754. The present observations cannot rule out the alternative possibility that the cooler gas is associated with the outer cluster atmosphere rather than individual subclusters, as appears to be the case for A119. Astro D observations will be necessary to distinguish between these two possibilities for both clusters.
Application of artificial intelligence to search ground-state geometry of clusters
NASA Astrophysics Data System (ADS)
Lemes, Maurício Ruv; Marim, L. R.; dal Pino, A.
2002-08-01
We introduce a global optimization procedure, the neural-assisted genetic algorithm (NAGA). It combines the power of an artificial neural network (ANN) with the versatility of the genetic algorithm. This method is suitable to solve optimization problems that depend on some kind of heuristics to limit the search space. If a reasonable amount of data is available, the ANN can ``understand'' the problem and provide the genetic algorithm with a selected population of elements that will speed up the search for the optimum solution. We tested the method in a search for the ground-state geometry of silicon clusters. We trained the ANN with information about the geometry and energetics of small silicon clusters. Next, the ANN learned how to restrict the configurational space for larger silicon clusters. For Si10 and Si20, we noticed that the NAGA is at least three times faster than the ``pure'' genetic algorithm. As the size of the cluster increases, it is expected that the gain in terms of time will increase as well.
Automatic Clustering Using FSDE-Forced Strategy Differential Evolution
NASA Astrophysics Data System (ADS)
Yasid, A.
2018-01-01
Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le
2013-01-01
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
ABCluster: the artificial bee colony algorithm for cluster global optimization.
Zhang, Jun; Dolg, Michael
2015-10-07
Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters.
Hybrid approach of selecting hyperparameters of support vector machine for regression.
Jeng, Jin-Tsong
2006-06-01
To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
Fuzzy Document Clustering Approach using WordNet Lexical Categories
NASA Astrophysics Data System (ADS)
Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Grant E.; Gunaratne, Don; Laskin, Julia
2015-04-16
Soft- and reactive landing of mass-selected ions is gaining attention as a promising approach for the precisely-controlled preparation of materials on surfaces that are not amenable to deposition using conventional methods. A broad range of ionization sources and mass-filters are available that make ion soft-landing a versatile tool for surface modification using beams of hyperthermal (< 100 eV) ions. The ability to select the mass-to-charge ratio of the ion, its kinetic energy and charge state, along with precise control of the size, shape, and position of the ion beam on the deposition target distinguishes ion soft landing from other surfacemore » modification techniques. Soft- and reactive landing have been used to prepare interfaces for practical applications as well as precisely-defined model surfaces for fundamental investigations in chemistry, physics, and materials science. For instance, soft- and reactive landing have been applied to study the surface chemistry of ions isolated in the gas-phase, prepare arrays of proteins for high-throughput biological screening, produce novel carbon-based and polymer materials, enrich the secondary structure of peptides and the chirality of organic molecules, immobilize electrochemically-active proteins and organometallics on electrodes, create thin films of complex molecules, and immobilize catalytically active organometallics as well as ligated metal clusters. In addition, soft landing has enabled investigation of the size-dependent behavior of bare metal clusters in the critical subnanometer size regime where chemical and physical properties do not scale predictably with size. The morphology, aggregation, and immobilization of larger bare metal nanoparticles, which are directly relevant to the design of catalysts as well as improved memory and electronic devices, have also been studied using ion soft landing. This review article begins in section 1 with a brief introduction to the existing applications of ion soft- and reactive landing. Section 2 provides an overview of the ionization sources and mass filters that have been used to date for soft landing of mass-selected ions. A discussion of the competing processes that occur during ion deposition as well as the types of ions and surfaces that have been investigated follows in section 3. Section 4 discusses the physical phenomena that occur during and after ion soft landing including retention and reduction of ionic charge along with factors that impact the efficiency of ion deposition. The influence of soft landing on the secondary structure and biological activity of complex ions is addressed in section 5. Lastly, an overview of the structure and mobility as well as the catalytic, optical, magnetic, and redox properties of bare ionic clusters and nanoparticles deposited onto surfaces is presented in section 6.« less
2015-01-01
Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893
Clustering Binary Data in the Presence of Masking Variables
ERIC Educational Resources Information Center
Brusco, Michael J.
2004-01-01
A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the…
NASA Technical Reports Server (NTRS)
Bonamente, Massimiliano; Lieu, Richard; Mittaz, Jonathan P. D.; Kaastra, Jelle S.; Nevalainen, Jukka
2005-01-01
Several nearby clusters exhibit an excess of soft X-ray radiation which cannot be attributed to the hot virialized intra-cluster medium. There is no consensus to date on the origin of the excess emission: it could be either of thermal origin, or due to an inverse Compton scattering of the cosmic microwave background. Using high resolution XMM-Newton data of Sersic 159-03 we first show that strong soft excess emission is detected out to a radial distance of 0.9 Mpc. The data are interpreted using the two viable models available, i.e., by invoking a warm reservoir of thermal gas, or relativistic electrons which are part of a cosmic ray population. The thermal model leads to a better goodness-of-fit, and the emitting warm gas must be high in mass and low in metallicity.
RNA secondary structure prediction using soft computing.
Ray, Shubhra Sankar; Pal, Sankar K
2013-01-01
Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
Wang, Jie-sheng; Han, Shuang; Shen, Na-na
2014-01-01
For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, an echo state network (ESN) based fusion soft-sensor model optimized by the improved glowworm swarm optimization (GSO) algorithm is proposed. Firstly, the color feature (saturation and brightness) and texture features (angular second moment, sum entropy, inertia moment, etc.) based on grey-level co-occurrence matrix (GLCM) are adopted to describe the visual characteristics of the flotation froth image. Then the kernel principal component analysis (KPCA) method is used to reduce the dimensionality of the high-dimensional input vector composed by the flotation froth image characteristics and process datum and extracts the nonlinear principal components in order to reduce the ESN dimension and network complex. The ESN soft-sensor model of flotation process is optimized by the GSO algorithm with congestion factor. Simulation results show that the model has better generalization and prediction accuracy to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:24982935
NASA Astrophysics Data System (ADS)
Yang, Kai; Chen, Xiangguang; Wang, Li; Jin, Huaiping
2017-01-01
In rubber mixing process, the key parameter (Mooney viscosity), which is used to evaluate the property of the product, can only be obtained with 4-6h delay offline. It is quite helpful for the industry, if the parameter can be estimate on line. Various data driven soft sensors have been used to prediction in the rubber mixing. However, it always not functions well due to the phase and nonlinear property in the process. The purpose of this paper is to develop an efficient soft sensing algorithm to solve the problem. Based on the proposed GMMD local sample selecting criterion, the phase information is extracted in the local modeling. Using the Gaussian local modeling method within Just-in-time (JIT) learning framework, nonlinearity of the process is well handled. Efficiency of the new method is verified by comparing the performance with various mainstream soft sensors, using the samples from real industrial rubber mixing process.
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
An image-space parallel convolution filtering algorithm based on shadow map
NASA Astrophysics Data System (ADS)
Li, Hua; Yang, Huamin; Zhao, Jianping
2017-07-01
Shadow mapping is commonly used in real-time rendering. In this paper, we presented an accurate and efficient method of soft shadows generation from planar area lights. First this method generated a depth map from light's view, and analyzed the depth-discontinuities areas as well as shadow boundaries. Then these areas were described as binary values in the texture map called binary light-visibility map, and a parallel convolution filtering algorithm based on GPU was enforced to smooth out the boundaries with a box filter. Experiments show that our algorithm is an effective shadow map based method that produces perceptually accurate soft shadows in real time with more details of shadow boundaries compared with the previous works.
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Iterative Track Fitting Using Cluster Classification in Multi Wire Proportional Chamber
NASA Astrophysics Data System (ADS)
Primor, David; Mikenberg, Giora; Etzion, Erez; Messer, Hagit
2007-10-01
This paper addresses the problem of track fitting of a charged particle in a multi wire proportional chamber (MWPC) using cathode readout strips. When a charged particle crosses a MWPC, a positive charge is induced on a cluster of adjacent strips. In the presence of high radiation background, the cluster charge measurements may be contaminated due to background particles, leading to less accurate hit position estimation. The least squares method for track fitting assumes the same position error distribution for all hits and thus loses its optimal properties on contaminated data. For this reason, a new robust algorithm is proposed. The algorithm first uses the known spatial charge distribution caused by a single charged particle over the strips, and classifies the clusters into ldquocleanrdquo and ldquodirtyrdquo clusters. Then, using the classification results, it performs an iterative weighted least squares fitting procedure, updating its optimal weights each iteration. The performance of the suggested algorithm is compared to other track fitting techniques using a simulation of tracks with radiation background. It is shown that the algorithm improves the track fitting performance significantly. A practical implementation of the algorithm is presented for muon track fitting in the cathode strip chamber (CSC) of the ATLAS experiment.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.
2007-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Optical Assessment of Soft Contact Lens Edge-Thickness.
Tankam, Patrice; Won, Jungeun; Canavesi, Cristina; Cox, Ian; Rolland, Jannick P
2016-08-01
To assess the edge shape of soft contact lenses using Gabor-Domain Optical Coherence Microscopy (GD-OCM) with a 2-μm imaging resolution in three dimensions and to generate edge-thickness profiles at different distances from the edge tip of soft contact lenses. A high-speed custom-designed GD-OCM system was used to produce 3D images of the edge of an experimental soft contact lens (Bausch + Lomb, Rochester, NY) in four different configurations: in air, submerged into water, submerged into saline with contrast agent, and placed onto the cornea of a porcine eyeball. An algorithm to compute the edge-thickness was developed and applied to cross-sectional images. The proposed algorithm includes the accurate detection of the interfaces between the lens and the environment, and the correction of the refraction error. The sharply defined edge tip of a soft contact lens was visualized in 3D. Results showed precise thickness measurement of the contact lens edge profile. Fifty cross-sectional image frames for each configuration were used to test the robustness of the algorithm in evaluating the edge-thickness at any distance from the edge tip. The precision of the measurements was less than 0.2 μm. The results confirmed the ability of GD-OCM to provide high-definition images of soft contact lens edges. As a nondestructive, precise, and fast metrology tool for soft contact lens measurement, the integration of GD-OCM in the design and manufacturing of contact lenses will be beneficial for further improvement in edge design and quality control. In the clinical perspective, the in vivo evaluation of the lens fitted onto the cornea will advance our understanding of how the edge interacts with the ocular surface. The latter will provide insights into the impact of long-term use of contact lenses on the visual performance.
Optical Assessment of Soft Contact Lens Edge-Thickness
Tankam, Patrice; Won, Jungeun; Canavesi, Cristina; Cox, Ian; Rolland, Jannick P.
2016-01-01
Purpose To assess the edge shape of soft contact lenses using Gabor-Domain Optical Coherence Microscopy (GD-OCM) with a 2 μm imaging resolution in three dimensions, and to generate edge-thickness profiles at different distances from the edge tip of soft contact lenses. Methods A high-speed custom-designed GD-OCM system was used to produce 3D images of the edge of an experimental soft contact lens (Bausch + Lomb, Rochester NY) in four different configurations: in air, submerged into water, submerged into saline with contrast agent, and placed onto the cornea of a porcine eyeball. An algorithm to compute the edge-thickness was developed and applied to cross-sectional images. The proposed algorithm includes the accurate detection of the interfaces between the lens and the environment, and the correction of the refraction error. Results The sharply defined edge tip of a soft contact lens was visualized in 3D. Results showed precise thickness measurement of the contact lens edge profile. 50 cross-sectional image frames for each configuration were used to test the robustness of the algorithm in evaluating the edge-thickness at any distance from the edge tip. The precision of the measurements was less than 0.2 μm. Conclusions The results confirmed the ability of GD-OCM to provide high definition images of soft contact lens edges. As a non-destructive, precise, and fast metrology tool for soft contact lens measurement, the integration of GD-OCM in the design and manufacturing of contact lenses will be beneficial for further improvement in edge design and quality control. In the clinical perspective, the in-vivo evaluation of the lens fitted onto the cornea will advance our understanding of how the edge interacts with the ocular surface. The latter will provide insights into the impact of long-term use of contact lenses on the visual performance. PMID:27232902
NASA Astrophysics Data System (ADS)
Dayananda, Karanam Ravichandran; Straub, Jeremy
2017-05-01
This paper proposes a new hybrid algorithm for security, which incorporates both distributed and hierarchal approaches. It uses a mobile data collector (MDC) to collect information in order to save energy of sensor nodes in a wireless sensor network (WSN) as, in most networks, these sensor nodes have limited energy. Wireless sensor networks are prone to security problems because, among other things, it is possible to use a rogue sensor node to eavesdrop on or alter the information being transmitted. To prevent this, this paper introduces a security algorithm for MDC-based WSNs. A key use of this algorithm is to protect the confidentiality of the information sent by the sensor nodes. The sensor nodes are deployed in a random fashion and form group structures called clusters. Each cluster has a cluster head. The cluster head collects data from the other nodes using the time-division multiple access protocol. The sensor nodes send their data to the cluster head for transmission to the base station node for further processing. The MDC acts as an intermediate node between the cluster head and base station. The MDC, using its dynamic acyclic graph path, collects the data from the cluster head and sends it to base station. This approach is useful for applications including warfighting, intelligent building and medicine. To assess the proposed system, the paper presents a comparison of its performance with other approaches and algorithms that can be used for similar purposes.
Unsupervised classification of multivariate geostatistical data: Two algorithms
NASA Astrophysics Data System (ADS)
Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques
2015-12-01
With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-01-01
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Competitive learning with pairwise constraints.
Covões, Thiago F; Hruschka, Eduardo R; Ghosh, Joydeep
2013-01-01
Constrained clustering has been an active research topic since the last decade. Most studies focus on batch-mode algorithms. This brief introduces two algorithms for on-line constrained learning, named on-line linear constrained vector quantization error (O-LCVQE) and constrained rival penalized competitive learning (C-RPCL). The former is a variant of the LCVQE algorithm for on-line settings, whereas the latter is an adaptation of the (on-line) RPCL algorithm to deal with constrained clustering. The accuracy results--in terms of the normalized mutual information (NMI)--from experiments with nine datasets show that the partitions induced by O-LCVQE are competitive with those found by the (batch-mode) LCVQE. Compared with this formidable baseline algorithm, it is surprising that C-RPCL can provide better partitions (in terms of the NMI) for most of the datasets. Also, experiments on a large dataset show that on-line algorithms for constrained clustering can significantly reduce the computational time.
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish–Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection. PMID:26447696
An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.
Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian
2015-01-01
Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.
Interactive visual exploration and refinement of cluster assignments.
Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R
2017-09-12
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering
Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu
2009-01-01
Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124
Consensus-Based Sorting of Neuronal Spike Waveforms
Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990
Consensus-Based Sorting of Neuronal Spike Waveforms.
Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.
NASA Astrophysics Data System (ADS)
Franke, R.
2016-11-01
In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
Cloud classification from satellite data using a fuzzy sets algorithm: A polar example
NASA Technical Reports Server (NTRS)
Key, J. R.; Maslanik, J. A.; Barry, R. G.
1988-01-01
Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.
Hybrid employment recommendation algorithm based on Spark
NASA Astrophysics Data System (ADS)
Li, Zuoquan; Lin, Yubei; Zhang, Xingming
2017-08-01
Aiming at the real-time application of collaborative filtering employment recommendation algorithm (CF), a clustering collaborative filtering recommendation algorithm (CCF) is developed, which applies hierarchical clustering to CF and narrows the query range of neighbour items. In addition, to solve the cold-start problem of content-based recommendation algorithm (CB), a content-based algorithm with users’ information (CBUI) is introduced for job recommendation. Furthermore, a hybrid recommendation algorithm (HRA) which combines CCF and CBUI algorithms is proposed, and implemented on Spark platform. The experimental results show that HRA can overcome the problems of cold start and data sparsity, and achieve good recommendation accuracy and scalability for employment recommendation.
Buried landmine detection using multivariate normal clustering
NASA Astrophysics Data System (ADS)
Duston, Brian M.
2001-10-01
A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.
Seman, Ali; Sapawi, Azizian Mohd; Salleh, Mohd Zaki
2015-06-01
Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.
A search for soft X-ray emission from red-giant coronae
NASA Technical Reports Server (NTRS)
Margon, B.; Mason, K. O.; Sanford, P. W.
1974-01-01
Hills has pointed out that if red-giant coronae are weak sources of soft X-rays, then the problems of the identification of the local component of the soft X-ray background and the observed lack of gas in globular clusters may be simultaneously resolved. Using instrumentation aboard OAO Copernicus, we have searched unsuccessfully for emission in the 10-100 A band from four nearby red giants. In all cases, our upper limits are of the order of the minimum theoretically predicted fluxes.
Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-01-01
To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.
NASA Astrophysics Data System (ADS)
Gu, Hui; Zhu, Hongxia; Cui, Yanfeng; Si, Fengqi; Xue, Rui; Xi, Han; Zhang, Jiayu
2018-06-01
An integrated combustion optimization scheme is proposed for the combined considering the restriction in coal-fired boiler combustion efficiency and outlet NOx emissions. Continuous attribute discretization and reduction techniques are handled as optimization preparation by E-Cluster and C_RED methods, in which the segmentation numbers don't need to be provided in advance and can be continuously adapted with data characters. In order to obtain results of multi-objections with clustering method for mixed data, a modified K-prototypes algorithm is then proposed. This algorithm can be divided into two stages as K-prototypes algorithm for clustering number self-adaptation and clustering for multi-objective optimization, respectively. Field tests were carried out at a 660 MW coal-fired boiler to provide real data as a case study for controllable attribute discretization and reduction in boiler system and obtaining optimization parameters considering [ maxηb, minyNOx ] multi-objective rule.
Image Segmentation Method Using Fuzzy C Mean Clustering Based on Multi-Objective Optimization
NASA Astrophysics Data System (ADS)
Chen, Jinlin; Yang, Chunzhi; Xu, Guangkui; Ning, Li
2018-04-01
Image segmentation is not only one of the hottest topics in digital image processing, but also an important part of computer vision applications. As one kind of image segmentation algorithms, fuzzy C-means clustering is an effective and concise segmentation algorithm. However, the drawback of FCM is that it is sensitive to image noise. To solve the problem, this paper designs a novel fuzzy C-mean clustering algorithm based on multi-objective optimization. We add a parameter λ to the fuzzy distance measurement formula to improve the multi-objective optimization. The parameter λ can adjust the weights of the pixel local information. In the algorithm, the local correlation of neighboring pixels is added to the improved multi-objective mathematical model to optimize the clustering cent. Two different experimental results show that the novel fuzzy C-means approach has an efficient performance and computational time while segmenting images by different type of noises.
Mass Spectroscopy of Neutral Metal Oxide Clusters Using a Desk-Top Soft X-Ray Laser
NASA Astrophysics Data System (ADS)
Dong, F.; Heinbuch, S.; Bernstein, E. R.; Rocca, J. J.
We report the use of a compact 46.9 nm capillary discharge soft x-ray laser in the study of metal-oxide nanoclusters using mass spectroscopy. Transition metal oxides are widely used as heterogeneous catalysts and catalytic supports in industrial processes. There are numerous applications for transition metal oxide catalysts, and although they are widely used, there is a lack of fundamental understanding of the complicated processes that occur on the metal oxide surface during catalysis. Conventional nanocluster spectroscopy techniques have used 193 nm radiation from an ArF excimer laser corresponding to a photon energy of 6.4 eV in order to photoionize a sample. Typical metal oxide nanocluster ionization energies fall into the range of 7-12 eV while some have even higher energies. Therefore a single 6.4 eV photon can not ionize the cluster making multiphoton processes the dominant ionization method. A major problem associated with mass spectroscopy can become evident during the multiphoton ionization of clusters. Specifically, the clusters may fragment during the ionization process and the identification of the neutral parent cluster can become difficult. In the present experiment neutral vanadium, niobium and tantalum oxide clusters are studied by single photon ionization with the 26.5 eV photons produced by a capillary discharge soft x-ray laser.1 During ionization, the metal oxide clusters are observed to be almost free of serious fragmentation. The most stable neutral cluster of vanadium, niobium, and tantalum oxide growth in a saturated oxygen condition are identified as MO2, M2O4/M2O5, M3O7, M4O10, M5O12, M6O15, M7O17, M8O20, and M9O22, which can be represented as a form (MO2)0,1(M2O5)y. M2O5 is identified as a basic unit to build-up the three kinds of metal oxide clusters. In the case of niobium and tantalum oxide clusters, the oxygen-deficient clusters with a structure of (MO2)2(M2O5)y are detected for groups that contain an even number of metal atoms. For vanadium oxide clusters, the oxygen-deficient clusters are detected for every family, indicating a stable structure of (VO2)x(V2O5)y. The stoichiometry of oxygen-rich clusters can be expressed as (MO2)0,1(M2O5)yO1-3 and their structures are consistent with chemically bonded species.
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
NASA Astrophysics Data System (ADS)
Basalto, Nicolas; Bellotti, Roberto; de Carlo, Francesco; Facchi, Paolo; Pantaleo, Ester; Pascazio, Saverio
2008-10-01
A clustering algorithm based on the Hausdorff distance is analyzed and compared to the single, complete, and average linkage algorithms. The four clustering procedures are applied to a toy example and to the time series of financial data. The dendrograms are scrutinized and their features compared. The Hausdorff linkage relies on firm mathematical grounds and turns out to be very effective when one has to discriminate among complex structures.
Fast clustering using adaptive density peak detection.
Wang, Xiao-Feng; Xu, Yifan
2017-12-01
Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Prediction of Tibial Rotation Pathologies Using Particle Swarm Optimization and K-Means Algorithms.
Sari, Murat; Tuna, Can; Akogul, Serkan
2018-03-28
The aim of this article is to investigate pathological subjects from a population through different physical factors. To achieve this, particle swarm optimization (PSO) and K-means (KM) clustering algorithms have been combined (PSO-KM). Datasets provided by the literature were divided into three clusters based on age and weight parameters and each one of right tibial external rotation (RTER), right tibial internal rotation (RTIR), left tibial external rotation (LTER), and left tibial internal rotation (LTIR) values were divided into three types as Type 1, Type 2 and Type 3 (Type 2 is non-pathological (normal) and the other two types are pathological (abnormal)), respectively. The rotation values of every subject in any cluster were noted. Then the algorithm was run and the produced values were also considered. The values of the produced algorithm, the PSO-KM, have been compared with the real values. The hybrid PSO-KM algorithm has been very successful on the optimal clustering of the tibial rotation types through the physical criteria. In this investigation, Type 2 (pathological subjects) is of especially high predictability and the PSO-KM algorithm has been very successful as an operation system for clustering and optimizing the tibial motion data assessments. These research findings are expected to be very useful for health providers, such as physiotherapists, orthopedists, and so on, in which this consequence may help clinicians to appropriately designing proper treatment schedules for patients.
Martín, Andrés; Barrientos, Antonio; Del Cerro, Jaime
2018-03-22
This article presents a new method to solve the inverse kinematics problem of hyper-redundant and soft manipulators. From an engineering perspective, this kind of robots are underdetermined systems. Therefore, they exhibit an infinite number of solutions for the inverse kinematics problem, and to choose the best one can be a great challenge. A new algorithm based on the cyclic coordinate descent (CCD) and named as natural-CCD is proposed to solve this issue. It takes its name as a result of generating very harmonious robot movements and trajectories that also appear in nature, such as the golden spiral. In addition, it has been applied to perform continuous trajectories, to develop whole-body movements, to analyze motion planning in complex environments, and to study fault tolerance, even for both prismatic and rotational joints. The proposed algorithm is very simple, precise, and computationally efficient. It works for robots either in two or three spatial dimensions and handles a large amount of degrees-of-freedom. Because of this, it is aimed to break down barriers between discrete hyper-redundant and continuum soft robots.
NASA Astrophysics Data System (ADS)
Sun, Benyuan; Yue, Shihong; Cui, Ziqiang; Wang, Huaxiang
2015-12-01
As an advanced measurement technique of non-radiant, non-intrusive, rapid response, and low cost, the electrical tomography (ET) technique has developed rapidly in recent decades. The ET imaging algorithm plays an important role in the ET imaging process. Linear back projection (LBP) is the most used ET algorithm due to its advantages of dynamic imaging process, real-time response, and easy realization. But the LBP algorithm is of low spatial resolution due to the natural ‘soft field’ effect and ‘ill-posed solution’ problems; thus its applicable ranges are greatly limited. In this paper, an original data decomposition method is proposed, and every ET measuring data are decomposed into two independent new data based on the positive and negative sensing areas of the measuring data. Consequently, the number of total measuring data is extended to twice as many as the number of the original data, thus effectively reducing the ‘ill-posed solution’. On the other hand, an index to measure the ‘soft field’ effect is proposed. The index shows that the decomposed data can distinguish between different contributions of various units (pixels) for any ET measuring data, and can efficiently reduce the ‘soft field’ effect of the ET imaging process. In light of the data decomposition method, a new linear back projection algorithm is proposed to improve the spatial resolution of the ET image. A series of simulations and experiments are applied to validate the proposed algorithm by the real-time performances and the progress of spatial resolutions.
Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam
2018-01-01
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
NASA Astrophysics Data System (ADS)
Chang, Chun; Huang, Benxiong; Xu, Zhengguang; Li, Bin; Zhao, Nan
2018-02-01
Three soft-input-soft-output (SISO) detection methods for dual-polarized quadrature duobinary (DP-QDB), including maximum-logarithmic-maximum-a-posteriori-probability-algorithm (Max-log-MAP)-based detection, soft-output-Viterbi-algorithm (SOVA)-based detection, and a proposed SISO detection, which can all be combined with SISO decoding, are presented. The three detection methods are investigated at 128 Gb/s in five-channel wavelength-division-multiplexing uncoded and low-density-parity-check (LDPC) coded DP-QDB systems by simulations. Max-log-MAP-based detection needs the returning-to-initial-states (RTIS) process despite having the best performance. When the LDPC code with a code rate of 0.83 is used, the detecting-and-decoding scheme with the SISO detection does not need RTIS and has better bit error rate (BER) performance than the scheme with SOVA-based detection. The former can reduce the optical signal-to-noise ratio (OSNR) requirement (at BER=10-5) by 2.56 dB relative to the latter. The application of the SISO iterative detection in LDPC-coded DP-QDB systems makes a good trade-off between requirements on transmission efficiency, OSNR requirement, and transmission distance, compared with the other two SISO methods.
Chen, Zhaoxue; Yu, Haizhong; Chen, Hao
2013-12-01
To solve the problem of traditional K-means clustering in which initial clustering centers are selected randomly, we proposed a new K-means segmentation algorithm based on robustly selecting 'peaks' standing for White Matter, Gray Matter and Cerebrospinal Fluid in multi-peaks gray histogram of MRI brain image. The new algorithm takes gray value of selected histogram 'peaks' as the initial K-means clustering center and can segment the MRI brain image into three parts of tissue more effectively, accurately, steadily and successfully. Massive experiments have proved that the proposed algorithm can overcome many shortcomings caused by traditional K-means clustering method such as low efficiency, veracity, robustness and time consuming. The histogram 'peak' selecting idea of the proposed segmentootion method is of more universal availability.
Yock, Adam D; Kim, Gwe-Ya
2017-09-01
To present the k-means clustering algorithm as a tool to address treatment planning considerations characteristic of stereotactic radiosurgery using a single isocenter for multiple targets. For 30 patients treated with stereotactic radiosurgery for multiple brain metastases, the geometric centroids and radii of each met were determined from the treatment planning system. In-house software used this as well as weighted and unweighted versions of the k-means clustering algorithm to group the targets to be treated with a single isocenter, and to position each isocenter. The algorithm results were evaluated using within-cluster sum of squares as well as a minimum target coverage metric that considered the effect of target size. Both versions of the algorithm were applied to an example patient to demonstrate the prospective determination of the appropriate number and location of isocenters. Both weighted and unweighted versions of the k-means algorithm were applied successfully to determine the number and position of isocenters. Comparing the two, both the within-cluster sum of squares metric and the minimum target coverage metric resulting from the unweighted version were less than those from the weighted version. The average magnitudes of the differences were small (-0.2 cm 2 and 0.1% for the within cluster sum of squares and minimum target coverage, respectively) but statistically significant (Wilcoxon signed-rank test, P < 0.01). The differences between the versions of the k-means clustering algorithm represented an advantage of the unweighted version for the within-cluster sum of squares metric, and an advantage of the weighted version for the minimum target coverage metric. While additional treatment planning considerations have a large influence on the final treatment plan quality, both versions of the k-means algorithm provide automatic, consistent, quantitative, and objective solutions to the tasks associated with SRS treatment planning using a single isocenter for multiple targets. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
A system for learning statistical motion patterns.
Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve
2006-09-01
Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.
Infrared small target tracking based on SOPC
NASA Astrophysics Data System (ADS)
Hu, Taotao; Fan, Xiang; Zhang, Yu-Jin; Cheng, Zheng-dong; Zhu, Bin
2011-01-01
The paper presents a low cost FPGA based solution for a real-time infrared small target tracking system. A specialized architecture is presented based on a soft RISC processor capable of running kernel based mean shift tracking algorithm. Mean shift tracking algorithm is realized in NIOS II soft-core with SOPC (System on a Programmable Chip) technology. Though mean shift algorithm is widely used for target tracking, the original mean shift algorithm can not be directly used for infrared small target tracking. As infrared small target only has intensity information, so an improved mean shift algorithm is presented in this paper. How to describe target will determine whether target can be tracked by mean shift algorithm. Because color target can be tracked well by mean shift algorithm, imitating color image expression, spatial component and temporal component are advanced to describe target, which forms pseudo-color image. In order to improve the processing speed parallel technology and pipeline technology are taken. Two RAM are taken to stored images separately by ping-pong technology. A FLASH is used to store mass temp data. The experimental results show that infrared small target is tracked stably in complicated background.
Identify High-Quality Protein Structural Models by Enhanced K-Means.
Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.
Identify High-Quality Protein Structural Models by Enhanced K-Means
Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang
2017-01-01
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1994-02-02
This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of thismore » cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.« less
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Identifying People with Soft-Biometrics at Fleet Week
2013-03-01
onboard sensors. This included: Color Camera: Located in the right eye, Octavia stored 640x480 RGB images at ~4 Hz from a Point Grey Firefly camera. A...Face Detection The Fleet Week experiments demonstrated the potential of soft biometrics for recognition, but all of the existing algorithms currently
A Stochastic-Variational Model for Soft Mumford-Shah Segmentation
2006-01-01
In contemporary image and vision analysis, stochastic approaches demonstrate great flexibility in representing and modeling complex phenomena, while variational-PDE methods gain enormous computational advantages over Monte Carlo or other stochastic algorithms. In combination, the two can lead to much more powerful novel models and efficient algorithms. In the current work, we propose a stochastic-variational model for soft (or fuzzy) Mumford-Shah segmentation of mixture image patterns. Unlike the classical hard Mumford-Shah segmentation, the new model allows each pixel to belong to each image pattern with some probability. Soft segmentation could lead to hard segmentation, and hence is more general. The modeling procedure, mathematical analysis on the existence of optimal solutions, and computational implementation of the new model are explored in detail, and numerical examples of both synthetic and natural images are presented. PMID:23165059
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neylon, J., E-mail: jneylon@mednet.ucla.edu; Qi, X.; Sheng, K.
Purpose: Validating the usage of deformable image registration (DIR) for daily patient positioning is critical for adaptive radiotherapy (RT) applications pertaining to head and neck (HN) radiotherapy. The authors present a methodology for generating biomechanically realistic ground-truth data for validating DIR algorithms for HN anatomy by (a) developing a high-resolution deformable biomechanical HN model from a planning CT, (b) simulating deformations for a range of interfraction posture changes and physiological regression, and (c) generating subsequent CT images representing the deformed anatomy. Methods: The biomechanical model was developed using HN kVCT datasets and the corresponding structure contours. The voxels inside amore » given 3D contour boundary were clustered using a graphics processing unit (GPU) based algorithm that accounted for inconsistencies and gaps in the boundary to form a volumetric structure. While the bony anatomy was modeled as rigid body, the muscle and soft tissue structures were modeled as mass–spring-damper models with elastic material properties that corresponded to the underlying contoured anatomies. Within a given muscle structure, the voxels were classified using a uniform grid and a normalized mass was assigned to each voxel based on its Hounsfield number. The soft tissue deformation for a given skeletal actuation was performed using an implicit Euler integration with each iteration split into two substeps: one for the muscle structures and the other for the remaining soft tissues. Posture changes were simulated by articulating the skeletal structure and enabling the soft structures to deform accordingly. Physiological changes representing tumor regression were simulated by reducing the target volume and enabling the surrounding soft structures to deform accordingly. Finally, the authors also discuss a new approach to generate kVCT images representing the deformed anatomy that accounts for gaps and antialiasing artifacts that may be caused by the biomechanical deformation process. Accuracy and stability of the model response were validated using ground-truth simulations representing soft tissue behavior under local and global deformations. Numerical accuracy of the HN deformations was analyzed by applying nonrigid skeletal transformations acquired from interfraction kVCT images to the model’s skeletal structures and comparing the subsequent soft tissue deformations of the model with the clinical anatomy. Results: The GPU based framework enabled the model deformation to be performed at 60 frames/s, facilitating simulations of posture changes and physiological regressions at interactive speeds. The soft tissue response was accurate with a R{sup 2} value of >0.98 when compared to ground-truth global and local force deformation analysis. The deformation of the HN anatomy by the model agreed with the clinically observed deformations with an average correlation coefficient of 0.956. For a clinically relevant range of posture and physiological changes, the model deformations stabilized with an uncertainty of less than 0.01 mm. Conclusions: Documenting dose delivery for HN radiotherapy is essential accounting for posture and physiological changes. The biomechanical model discussed in this paper was able to deform in real-time, allowing interactive simulations and visualization of such changes. The model would allow patient specific validations of the DIR method and has the potential to be a significant aid in adaptive radiotherapy techniques.« less
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.
Zheng, Hai-Tao; Borchert, Charles; Kim, Hong-Gee
2010-02-01
Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.
Reducing the Volume of NASA Earth-Science Data
NASA Technical Reports Server (NTRS)
Lee, Seungwon; Braverman, Amy J.; Guillaume, Alexandre
2010-01-01
A computer program reduces data generated by NASA Earth-science missions into representative clusters characterized by centroids and membership information, thereby reducing the large volume of data to a level more amenable to analysis. The program effects an autonomous data-reduction/clustering process to produce a representative distribution and joint relationships of the data, without assuming a specific type of distribution and relationship and without resorting to domain-specific knowledge about the data. The program implements a combination of a data-reduction algorithm known as the entropy-constrained vector quantization (ECVQ) and an optimization algorithm known as the differential evolution (DE). The combination of algorithms generates the Pareto front of clustering solutions that presents the compromise between the quality of the reduced data and the degree of reduction. Similar prior data-reduction computer programs utilize only a clustering algorithm, the parameters of which are tuned manually by users. In the present program, autonomous optimization of the parameters by means of the DE supplants the manual tuning of the parameters. Thus, the program determines the best set of clustering solutions without human intervention.
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Soft-Collinear Mode for Jet Rates in Soft-Collinear Effective Theory
Chien, Yang-Ting; Lee, Christopher; Hornig, Andrew
2016-01-29
We propose the addition of a new "soft-collinear" mode to soft collinear effective theory (SCET) below the usual soft scale to factorize and resum logarithms of jet radii R in jet cross sections. We consider exclusive 2-jet cross sections in e +e - collisions with an energy veto Λ on additional jets. The key observation is that there are actually two pairs of energy scales whose ratio is R: the transverse momentum QR of the energetic particles inside jets and their total energy Q, and the transverse momentum ΛR of soft particles that are cut out of the jet cones and their energy Λ. The soft-collinear mode is necessary to factorize and resum logarithms of the latter hierarchy. We show how this factorization occurs in the jet thrust cross section for cone and k T-type algorithms at O(α s) and using the thrust cone algorithm at O(αmore » $$2\\atop{s}$$). We identify the presence of hard-collinear, in-jet soft, global (veto) soft, and soft-collinear modes in the jet thrust cross section. We also observe here that the in-jet soft modes measured with thrust are actually the "csoft" modes of the theory SCET +. We dub the new theory with both csoft and soft-collinear modes "SCET ++". We go on to explain the relation between the "unmeasured" jet function appearing in total exclusive jet cross sections and the hard-collinear and csoft functions in measured jet thrust cross sections. We do not resum logs that are non-global in origin, arising from the ratio of the scales of soft radiation whose thrust is measured at Q$${{\\tau}}$$/R and of the soft-collinear radiation at 2ΛR. Their resummation would require the introduction of additional operators beyond those we consider here. The steps we outline here are a necessary part of summing logs of R that are global in nature and have not been factorized and resummed beyond leading-log level previously.« less
Soft-Collinear Mode for Jet Rates in Soft-Collinear Effective Theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chien, Yang-Ting; Lee, Christopher; Hornig, Andrew
We propose the addition of a new "soft-collinear" mode to soft collinear effective theory (SCET) below the usual soft scale to factorize and resum logarithms of jet radii R in jet cross sections. We consider exclusive 2-jet cross sections in e +e - collisions with an energy veto Λ on additional jets. The key observation is that there are actually two pairs of energy scales whose ratio is R: the transverse momentum QR of the energetic particles inside jets and their total energy Q, and the transverse momentum ΛR of soft particles that are cut out of the jet cones and their energy Λ. The soft-collinear mode is necessary to factorize and resum logarithms of the latter hierarchy. We show how this factorization occurs in the jet thrust cross section for cone and k T-type algorithms at O(α s) and using the thrust cone algorithm at O(αmore » $$2\\atop{s}$$). We identify the presence of hard-collinear, in-jet soft, global (veto) soft, and soft-collinear modes in the jet thrust cross section. We also observe here that the in-jet soft modes measured with thrust are actually the "csoft" modes of the theory SCET +. We dub the new theory with both csoft and soft-collinear modes "SCET ++". We go on to explain the relation between the "unmeasured" jet function appearing in total exclusive jet cross sections and the hard-collinear and csoft functions in measured jet thrust cross sections. We do not resum logs that are non-global in origin, arising from the ratio of the scales of soft radiation whose thrust is measured at Q$${{\\tau}}$$/R and of the soft-collinear radiation at 2ΛR. Their resummation would require the introduction of additional operators beyond those we consider here. The steps we outline here are a necessary part of summing logs of R that are global in nature and have not been factorized and resummed beyond leading-log level previously.« less
Walton, Barbara L; Verbeck, Guido F
2014-08-19
Matrix-assisted laser desorption ionization (MALDI) imaging is gaining popularity, but matrix effects such as mass spectral interference and damage to the sample limit its applications. Replacing traditional matrices with silver particles capable of equivalent or increased photon energy absorption from the incoming laser has proven to be beneficial for low mass analysis. Not only can silver clusters be advantageous for low mass compound detection, but they can be used for imaging as well. Conventional matrix application methods can obstruct samples, such as fingerprints, rendering them useless after mass analysis. The ability to image latent fingerprints without causing damage to the ridge pattern is important as it allows for further characterization of the print. The application of silver clusters by soft-landing ion mobility allows for enhanced MALDI and preservation of fingerprint integrity.
Wu, Guorong; Yap, Pew-Thian; Kim, Minjeong; Shen, Dinggang
2010-02-01
We present an improved MR brain image registration algorithm, called TPS-HAMMER, which is based on the concepts of attribute vectors and hierarchical landmark selection scheme proposed in the highly successful HAMMER registration algorithm. We demonstrate that TPS-HAMMER algorithm yields better registration accuracy, robustness, and speed over HAMMER owing to (1) the employment of soft correspondence matching and (2) the utilization of thin-plate splines (TPS) for sparse-to-dense deformation field generation. These two aspects can be integrated into a unified framework to refine the registration iteratively by alternating between soft correspondence matching and dense deformation field estimation. Compared with HAMMER, TPS-HAMMER affords several advantages: (1) unlike the Gaussian propagation mechanism employed in HAMMER, which can be slow and often leaves unreached blotches in the deformation field, the deformation interpolation in the non-landmark points can be obtained immediately with TPS in our algorithm; (2) the smoothness of deformation field is preserved due to the nice properties of TPS; (3) possible misalignments can be alleviated by allowing the matching of the landmarks with a number of possible candidate points and enforcing more exact matches in the final stages of the registration. Extensive experiments have been conducted, using the original HAMMER as a comparison baseline, to validate the merits of TPS-HAMMER. The results show that TPS-HAMMER yields significant improvement in both accuracy and speed, indicating high applicability for the clinical scenario. Copyright (c) 2009 Elsevier Inc. All rights reserved.
A local search for a graph clustering problem
NASA Astrophysics Data System (ADS)
Navrotskaya, Anna; Il'ev, Victor
2016-10-01
In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
Real-time simulation of the nonlinear visco-elastic deformations of soft tissues.
Basafa, Ehsan; Farahmand, Farzam
2011-05-01
Mass-spring-damper (MSD) models are often used for real-time surgery simulation due to their fast response and fairly realistic deformation replication. An improved real time simulation model of soft tissue deformation due to a laparoscopic surgical indenter was developed and tested. The mechanical realization of conventional MSD models was improved using nonlinear springs and nodal dampers, while their high computational efficiency was maintained using an adapted implicit integration algorithm. New practical algorithms for model parameter tuning, collision detection, and simulation were incorporated. The model was able to replicate complex biological soft tissue mechanical properties under large deformations, i.e., the nonlinear and viscoelastic behaviors. The simulated response of the model after tuning of its parameters to the experimental data of a deer liver sample, closely tracked the reference data with high correlation and maximum relative differences of less than 5 and 10%, for the tuning and testing data sets respectively. Finally, implementation of the proposed model and algorithms in a graphical environment resulted in a real-time simulation with update rates of 150 Hz for interactive deformation and haptic manipulation, and 30 Hz for visual rendering. The proposed real time simulation model of soft tissue deformation due to a laparoscopic surgical indenter was efficient, realistic, and accurate in ex vivo testing. This model is a suitable candidate for testing in vivo during laparoscopic surgery.
Quark cluster model for deep-inelastic lepton-deuteron scattering
NASA Astrophysics Data System (ADS)
Yen, G.; Vary, J. P.; Harindranath, A.; Pirner, H. J.
1990-10-01
We evaluate the contribution of quasifree nucleon knockout and of inelastic lepton-nucleon scattering in inclusive electron-deuteron reactions at large momentum transfer. We examine the degree of quantitative agreement with deuteron wave functions from the Reid soft-core and Bonn realistic nucleon-nucleon interactions. For the range of data available there is strong sensitivity to the tensor correlations which are distinctively different in these two deuteron models. At this stage of the analyses the Reid soft-core wave function provides a reasonable description of the data while the Bonn wave function does not. We then include a six-quark cluster component whose relative contribution is based on an overlap criterion and obtain a good description of all the data with both interactions. The critical separation at which overlap occurs (formation of six-quark clusters) is taken to be 1.0 fm and the six-quark cluster probability is 4.7% for Reid and 5.4% for Bonn. As a consequence the quark cluster model with either Reid or Bonn wave function describe the SLAC inclusive electron-deuteron scattering data equally well. We then show how additional data would be decisive in resolving which model is ultimately more correct.
Adaptive fuzzy system for 3-D vision
NASA Technical Reports Server (NTRS)
Mitra, Sunanda
1993-01-01
An adaptive fuzzy system using the concept of the Adaptive Resonance Theory (ART) type neural network architecture and incorporating fuzzy c-means (FCM) system equations for reclassification of cluster centers was developed. The Adaptive Fuzzy Leader Clustering (AFLC) architecture is a hybrid neural-fuzzy system which learns on-line in a stable and efficient manner. The system uses a control structure similar to that found in the Adaptive Resonance Theory (ART-1) network to identify the cluster centers initially. The initial classification of an input takes place in a two stage process; a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from Fuzzy c-Means (FCM) system equations for the centroids and the membership values. The operational characteristics of AFLC and the critical parameters involved in its operation are discussed. The performance of the AFLC algorithm is presented through application of the algorithm to the Anderson Iris data, and laser-luminescent fingerprint image data. The AFLC algorithm successfully classifies features extracted from real data, discrete or continuous, indicating the potential strength of this new clustering algorithm in analyzing complex data sets. The hybrid neuro-fuzzy AFLC algorithm will enhance analysis of a number of difficult recognition and control problems involved with Tethered Satellite Systems and on-orbit space shuttle attitude controller.
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel
2017-09-18
The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.
Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai
2016-01-01
Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709
Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan
2018-03-26
Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to cytoscape ( http://apps.cytoscape.org/apps/eclerize ).
Load Balancing in Distributed Web Caching: A Novel Clustering Approach
NASA Astrophysics Data System (ADS)
Tiwari, R.; Kumar, K.; Khan, G.
2010-11-01
The World Wide Web suffers from scaling and reliability problems due to overloaded and congested proxy servers. Caching at local proxy servers helps, but cannot satisfy more than a third to half of requests; more requests are still sent to original remote origin servers. In this paper we have developed an algorithm for Distributed Web Cache, which incorporates cooperation among proxy servers of one cluster. This algorithm uses Distributed Web Cache concepts along with static hierarchies with geographical based clusters of level one proxy server with dynamic mechanism of proxy server during the congestion of one cluster. Congestion and scalability problems are being dealt by clustering concept used in our approach. This results in higher hit ratio of caches, with lesser latency delay for requested pages. This algorithm also guarantees data consistency between the original server objects and the proxy cache objects.
Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization
NASA Astrophysics Data System (ADS)
Liu, Zexi
2018-01-01
Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.
Computer program documentation: ISOCLS iterative self-organizing clustering program, program C094
NASA Technical Reports Server (NTRS)
Minter, R. T. (Principal Investigator)
1972-01-01
The author has identified the following significant results. This program implements an algorithm which, ideally, sorts a given set of multivariate data points into similar groups or clusters. The program is intended for use in the evaluation of multispectral scanner data; however, the algorithm could be used for other data types as well. The user may specify a set of initial estimated cluster means to begin the procedure, or he may begin with the assumption that all the data belongs to one cluster. The procedure is initiatized by assigning each data point to the nearest (in absolute distance) cluster mean. If no initial cluster means were input, all of the data is assigned to cluster 1. The means and standard deviations are calculated for each cluster.
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Motkova, A. V.
2018-01-01
A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.
The Twinkling Fractal Theory of the Glass Transition: Applications to Soft Matter
NASA Astrophysics Data System (ADS)
Wool, Richard
2012-02-01
The Twinkling Fractal Theory (TFT) of the glass transition has recently been demonstrated experimentally [J.F. Stanzione et al., J. Non Cryst. Sol., (2011, 357,311]. The hard to-soft matter transition is characterized by the presence of solid fractal clusters with liquid-like pools that are dynamically interchanging via their anharmonic intermolecular potentials with Boltzmann energy populations with a characteristic temperature dependent vibrational density of states g(φ) ˜ φ^df . The twinkling fractal frequencies φ cover a range of 10^12 Hz to 10-10Hz and the fractal solid clusters of size R have a lifetime τ ˜ R^Df/df, where the fractal dimension Df 2.4 and the fracton dimension df = 4/3. Here we explore its application to a number of soft matter issues. These include (a) Confinement effects on Tg reduction in thin films of thickness h, where by virtue of large cluster exclusion, δTg ˜ 1/h^Df/df; (b) Tg gradients near bulk surfaces, where the smaller clusters on the surface have a faster relaxation time; (c) Effect of twinkling surfaces on cell growth, where at T Tg + 20 C, there exists a twinkling fractal range that leads to bell-shaped enhancement of cell growth and chemical up-regulation via the twinkling surfaces ``communicating `` with the cells through their vibrations; and (d) adhesion above and below Tg where topological fluctuations associated with g(φ) promotes the development of nano-nails at the interface.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis
Liu, Jingxian; Wu, Kefeng
2017-01-01
The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353
Geographic atrophy phenotype identification by cluster analysis.
Monés, Jordi; Biarnés, Marc
2018-03-01
To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Aissa, J; Thomas, C; Sawicki, L M; Caspers, J; Kröpil, P; Antoch, G; Boos, J
2017-05-01
To investigate the value of dedicated computed tomography (CT) iterative metal artefact reduction (iMAR) algorithms in patients after spinal instrumentation. Post-surgical spinal CT images of 24 patients performed between March 2015 and July 2016 were retrospectively included. Images were reconstructed with standard weighted filtered back projection (WFBP) and with two dedicated iMAR algorithms (iMAR-Algo1, adjusted to spinal instrumentations and iMAR-Algo2, adjusted to large metallic hip implants) using a medium smooth kernel (B30f) and a sharp kernel (B70f). Frequencies of density changes were quantified to assess objective image quality. Image quality was rated subjectively by evaluating the visibility of critical anatomical structures including the central canal, the spinal cord, neural foramina, and vertebral bone. Both iMAR algorithms significantly reduced artefacts from metal compared with WFBP (p<0.0001). Results of subjective image analysis showed that both iMAR algorithms led to an improvement in visualisation of soft-tissue structures (median iMAR-Algo1=3; interquartile range [IQR]:1.5-3; iMAR-Algo2=4; IQR: 3.5-4) and bone structures (iMAR-Algo1=3; IQR:3-4; iMAR-Algo2=4; IQR:4-5) compared to WFBP (soft tissue: median 2; IQR: 0.5-2 and bone structures: median 2; IQR: 1-3; p<0.0001). Compared with iMAR-Algo1, objective artefact reduction and subjective visualisation of soft-tissue and bone structures were improved with iMAR-Algo2 (p<0.0001). Both iMAR algorithms reduced artefacts compared with WFBP, however, the iMAR algorithm with dedicated settings for large metallic implants was superior to the algorithm specifically adjusted to spinal implants. Copyright © 2016 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Zone-Based Routing Protocol for Wireless Sensor Networks
Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran
2014-01-01
Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads. PMID:27437455
Zone-Based Routing Protocol for Wireless Sensor Networks.
Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran
2014-01-01
Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads.
Unsupervised, Robust Estimation-based Clustering for Multispectral Images
NASA Technical Reports Server (NTRS)
Netanyahu, Nathan S.
1997-01-01
To prepare for the challenge of handling the archiving and querying of terabyte-sized scientific spatial databases, the NASA Goddard Space Flight Center's Applied Information Sciences Branch (AISB, Code 935) developed a number of characterization algorithms that rely on supervised clustering techniques. The research reported upon here has been aimed at continuing the evolution of some of these supervised techniques, namely the neural network and decision tree-based classifiers, plus extending the approach to incorporating unsupervised clustering algorithms, such as those based on robust estimation (RE) techniques. The algorithms developed under this task should be suited for use by the Intelligent Information Fusion System (IIFS) metadata extraction modules, and as such these algorithms must be fast, robust, and anytime in nature. Finally, so that the planner/schedule module of the IlFS can oversee the use and execution of these algorithms, all information required by the planner/scheduler must be provided to the IIFS development team to ensure the timely integration of these algorithms into the overall system.
Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.
2014-01-01
Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723
Srinivasan, A; Galbán, C J; Johnson, T D; Chenevert, T L; Ross, B D; Mukherji, S K
2010-04-01
Does the K-means algorithm do a better job of differentiating benign and malignant neck pathologies compared to only mean ADC? The objective of our study was to analyze the differences between ADC partitions to evaluate whether the K-means technique can be of additional benefit to whole-lesion mean ADC alone in distinguishing benign and malignant neck pathologies. MR imaging studies of 10 benign and 10 malignant proved neck pathologies were postprocessed on a PC by using in-house software developed in Matlab. Two neuroradiologists manually contoured the lesions, with the ADC values within each lesion clustered into 2 (low, ADC-ADC(L); high, ADC-ADC(H)) and 3 partitions (ADC(L); intermediate, ADC-ADC(I); ADC(H)) by using the K-means clustering algorithm. An unpaired 2-tailed Student t test was performed for all metrics to determine statistical differences in the means of the benign and malignant pathologies. A statistically significant difference between the mean ADC(L) clusters in benign and malignant pathologies was seen in the 3-cluster models of both readers (P = .03 and .022, respectively) and the 2-cluster model of reader 2 (P = .04), with the other metrics (ADC(H), ADC(I); whole-lesion mean ADC) not revealing any significant differences. ROC curves demonstrated the quantitative differences in mean ADC(H) and ADC(L) in both the 2- and 3-cluster models to be predictive of malignancy (2 clusters: P = .008, area under curve = 0.850; 3 clusters: P = .01, area under curve = 0.825). The K-means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared with whole-lesion mean ADC alone.
An improved K-means clustering algorithm in agricultural image segmentation
NASA Astrophysics Data System (ADS)
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm
NASA Astrophysics Data System (ADS)
Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane
2017-08-01
Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.
Scaling Deep Learning on GPU and Knights Landing clusters
You, Yang; Buluc, Aydin; Demmel, James
2017-09-26
The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Scaling Deep Learning on GPU and Knights Landing clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Buluc, Aydin; Demmel, James
The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Advances in Artificial Neural Networks - Methodological Development and Application
USDA-ARS?s Scientific Manuscript database
Artificial neural networks as a major soft-computing technology have been extensively studied and applied during the last three decades. Research on backpropagation training algorithms for multilayer perceptron networks has spurred development of other neural network training algorithms for other ne...
Maximum Margin Clustering of Hyperspectral Data
NASA Astrophysics Data System (ADS)
Niazmardi, S.; Safari, A.; Homayouni, S.
2013-09-01
In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
Information Filtering via Clustering Coefficients of User-Object Bipartite Networks
NASA Astrophysics Data System (ADS)
Guo, Qiang; Leng, Rui; Shi, Kerui; Liu, Jian-Guo
The clustering coefficient of user-object bipartite networks is presented to evaluate the overlap percentage of neighbors rating lists, which could be used to measure interest correlations among neighbor sets. The collaborative filtering (CF) information filtering algorithm evaluates a given user's interests in terms of his/her friends' opinions, which has become one of the most successful technologies for recommender systems. In this paper, different from the object clustering coefficient, users' clustering coefficients of user-object bipartite networks are introduced to improve the user similarity measurement. Numerical results for MovieLens and Netflix data sets show that users' clustering effects could enhance the algorithm performance. For MovieLens data set, the algorithmic accuracy, measured by the average ranking score, can be improved by 12.0% and the diversity could be improved by 18.2% and reach 0.649 when the recommendation list equals to 50. For Netflix data set, the accuracy could be improved by 14.5% at the optimal case and the popularity could be reduced by 13.4% comparing with the standard CF algorithm. Finally, we investigate the sparsity effect on the performance. This work indicates the user clustering coefficients is an effective factor to measure the user similarity, meanwhile statistical properties of user-object bipartite networks should be investigated to estimate users' tastes.
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
An effective trust-based recommendation method using a novel graph clustering algorithm
NASA Astrophysics Data System (ADS)
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
Modification of Hazen's equation in coarse grained soils by soft computing techniques
NASA Astrophysics Data System (ADS)
Kaynar, Oguz; Yilmaz, Isik; Marschalko, Marian; Bednarik, Martin; Fojtova, Lucie
2013-04-01
Hazen first proposed a Relationship between coefficient of permeability (k) and effective grain size (d10) was first proposed by Hazen, and it was then extended by some other researchers. However many attempts were done for estimation of k, correlation coefficients (R2) of the models were generally lower than ~0.80 and whole grain size distribution curves were not included in the assessments. Soft computing techniques such as; artificial neural networks, fuzzy inference systems, genetic algorithms, etc. and their hybrids are now being successfully used as an alternative tool. In this study, use of some soft computing techniques such as Artificial Neural Networks (ANNs) (MLP, RBF, etc.) and Adaptive Neuro-Fuzzy Inference System (ANFIS) for prediction of permeability of coarse grained soils was described, and Hazen's equation was then modificated. It was found that the soft computing models exhibited high performance in prediction of permeability coefficient. However four different kinds of ANN algorithms showed similar prediction performance, results of MLP was found to be relatively more accurate than RBF models. The most reliable prediction was obtained from ANFIS model.
Structure Analysis of Jungle-Gym-Type Gels by Brownian Dynamics Simulation
NASA Astrophysics Data System (ADS)
Ohta, Noriyoshi; Ono, Kohki; Takasu, Masako; Furukawa, Hidemitsu
2008-02-01
We investigated the structure and the formation process of two kinds of gels by Brownian dynamics simulation. The effect of flexibility of main chain oligomer was studied. From our results, hard gel with rigid main chain forms more homogeneous network structure than soft gel with flexible main chain. In soft gel, many small loops are formed, and clusters tend to shrink. This heterogeneous network structure may be caused by microgels. In the low density case, soft gel shows more heterogeneity than the high density case.
Automatic detection of larynx cancer from contrast-enhanced magnetic resonance images
NASA Astrophysics Data System (ADS)
Doshi, Trushali; Soraghan, John; Grose, Derek; MacKenzie, Kenneth; Petropoulakis, Lykourgos
2015-03-01
Detection of larynx cancer from medical imaging is important for the quantification and for the definition of target volumes in radiotherapy treatment planning (RTP). Magnetic resonance imaging (MRI) is being increasingly used in RTP due to its high resolution and excellent soft tissue contrast. Manually detecting larynx cancer from sequential MRI is time consuming and subjective. The large diversity of cancer in terms of geometry, non-distinct boundaries combined with the presence of normal anatomical regions close to the cancer regions necessitates the development of automatic and robust algorithms for this task. A new automatic algorithm for the detection of larynx cancer from 2D gadoliniumenhanced T1-weighted (T1+Gd) MRI to assist clinicians in RTP is presented. The algorithm employs edge detection using spatial neighborhood information of pixels and incorporates this information in a fuzzy c-means clustering process to robustly separate different tissues types. Furthermore, it utilizes the information of the expected cancerous location for cancer regions labeling. Comparison of this automatic detection system with manual clinical detection on real T1+Gd axial MRI slices of 2 patients (24 MRI slices) with visible larynx cancer yields an average dice similarity coefficient of 0.78+/-0.04 and average root mean square error of 1.82+/-0.28 mm. Preliminary results show that this fully automatic system can assist clinicians in RTP by obtaining quantifiable and non-subjective repeatable detection results in a particular time-efficient and unbiased fashion.
Regional shape-based feature space for segmenting biomedical images using neural networks
NASA Astrophysics Data System (ADS)
Sundaramoorthy, Gopal; Hoford, John D.; Hoffman, Eric A.
1993-07-01
In biomedical images, structure of interest, particularly the soft tissue structures, such as the heart, airways, bronchial and arterial trees often have grey-scale and textural characteristics similar to other structures in the image, making it difficult to segment them using only gray- scale and texture information. However, these objects can be visually recognized by their unique shapes and sizes. In this paper we discuss, what we believe to be, a novel, simple scheme for extracting features based on regional shapes. To test the effectiveness of these features for image segmentation (classification), we use an artificial neural network and a statistical cluster analysis technique. The proposed shape-based feature extraction algorithm computes regional shape vectors (RSVs) for all pixels that meet a certain threshold criteria. The distance from each such pixel to a boundary is computed in 8 directions (or in 26 directions for a 3-D image). Together, these 8 (or 26) values represent the pixel's (or voxel's) RSV. All RSVs from an image are used to train a multi-layered perceptron neural network which uses these features to 'learn' a suitable classification strategy. To clearly distinguish the desired object from other objects within an image, several examples from inside and outside the desired object are used for training. Several examples are presented to illustrate the strengths and weaknesses of our algorithm. Both synthetic and actual biomedical images are considered. Future extensions to this algorithm are also discussed.
A novel unsupervised spike sorting algorithm for intracranial EEG.
Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R
2011-01-01
This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.
A graph-Laplacian-based feature extraction algorithm for neural spike sorting.
Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos
2009-01-01
Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems
Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.
2014-01-01
This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
Unsupervised spike sorting based on discriminative subspace learning.
Keshtkaran, Mohammad Reza; Yang, Zhi
2014-01-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
VizieR Online Data Catalog: Proper motions of PM2000 open clusters (Krone-Martins+, 2010)
NASA Astrophysics Data System (ADS)
Krone-Martins, A.; Soubiran, C.; Ducourant, C.; Teixeira, R.; Le Campion, J. F.
2010-04-01
We present lists of proper-motions and kinematic membership probabilities in the region of 49 open clusters or possible open clusters. The stellar proper motions were taken from the Bordeaux PM2000 catalogue. The segregation between cluster and field stars and the assignment of membership probabilities was accomplished by applying a fully automated method based on parametrisations for the probability distribution functions and genetic algorithm optimisation heuristics associated with a derivative-based hill climbing algorithm for the likelihood optimization. (3 data files).
2010-01-01
Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the detection of moderately irregularly shaped clusters. The multi-objective cohesion scan is most effective for the detection of highly irregularly shaped clusters. PMID:21034451
A semi-supervised classification algorithm using the TAD-derived background as training data
NASA Astrophysics Data System (ADS)
Fan, Lei; Ambeau, Brittany; Messinger, David W.
2013-05-01
In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
NASA Astrophysics Data System (ADS)
Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.
2017-07-01
Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
Saeed, Faisal; Salim, Naomie; Abdo, Ammar
2013-07-01
Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gohier, Francis; Dellimore, Kiran; Scheffer, Cornie
2013-01-01
The quality of cardiopulmonary resuscitation (CPR) is often inconsistent and frequently fails to meet recommended guidelines. One promising approach to address this problem is for clinicians to use an active feedback device during CPR. However, one major deficiency of existing feedback systems is that they fail to account for the displacement of the back support surface during chest compression (CC), which can be important when CPR is performed on a soft surface. In this study we present the development of a real-time CPR feedback system based on an algorithm which uses force and dual-accelerometer measurements to provide accurate estimation of the CC depth on a soft surface, without assuming full chest decompression. Based on adult CPR manikin tests it was found that the accuracy of the estimated CC depth for a dual accelerometer feedback system is significantly better (7.3% vs. 24.4%) than for a single accelerometer system on soft back support surfaces, in the absence or presence of a backboard. In conclusion, the algorithm used was found to be suitable for a real-time, dual accelerometer CPR feedback application since it yielded reasonable accuracy in terms of CC depth estimation, even when used on a soft back support surface.
Distributed Multihop Clustering Approach for Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Israr, Nauman; Awan, Irfan
Prolonging the life time of Wireless Sensor Networks (WSNs) has been the focus of current research. One of the issues that needs to be addressed along with prolonging the network life time is to ensure uniform energy consumption across the network in WSNs especially in case of random network deployment. Cluster based routing algorithms are believed to be the best choice for WSNs because they work on the principle of divide and conquer and also improve the network life time considerably compared to flat based routing schemes. In this paper we propose a new routing strategy based on two layers clustering which exploits the redundancy property of the network in order to minimise duplicate data transmission and also make the intercluster and intracluster communication multihop. The proposed algorithm makes use of the nodes in a network whose area coverage is covered by the neighbouring nodes. These nodes are marked as temporary cluster heads and later use these temporary cluster heads randomly for multihop intercluster communication. Performance studies indicate that the proposed algorithm solves effectively the problem of load balancing across the network and is more energy efficient compared to the enhanced version of widely used Leach algorithm.
Wang, Yuliang; Zhang, Zaicheng; Wang, Huimin; Bi, Shusheng
2015-01-01
Cell image segmentation plays a central role in numerous biology studies and clinical applications. As a result, the development of cell image segmentation algorithms with high robustness and accuracy is attracting more and more attention. In this study, an automated cell image segmentation algorithm is developed to get improved cell image segmentation with respect to cell boundary detection and segmentation of the clustered cells for all cells in the field of view in negative phase contrast images. A new method which combines the thresholding method and edge based active contour method was proposed to optimize cell boundary detection. In order to segment clustered cells, the geographic peaks of cell light intensity were utilized to detect numbers and locations of the clustered cells. In this paper, the working principles of the algorithms are described. The influence of parameters in cell boundary detection and the selection of the threshold value on the final segmentation results are investigated. At last, the proposed algorithm is applied to the negative phase contrast images from different experiments. The performance of the proposed method is evaluated. Results show that the proposed method can achieve optimized cell boundary detection and highly accurate segmentation for clustered cells. PMID:26066315
Adaptive and Resilient Soft Tensegrity Robots.
Rieffel, John; Mouret, Jean-Baptiste
2018-04-17
Living organisms intertwine soft (e.g., muscle) and hard (e.g., bones) materials, giving them an intrinsic flexibility and resiliency often lacking in conventional rigid robots. The emerging field of soft robotics seeks to harness these same properties to create resilient machines. The nature of soft materials, however, presents considerable challenges to aspects of design, construction, and control-and up until now, the vast majority of gaits for soft robots have been hand-designed through empirical trial-and-error. This article describes an easy-to-assemble tensegrity-based soft robot capable of highly dynamic locomotive gaits and demonstrating structural and behavioral resilience in the face of physical damage. Enabling this is the use of a machine learning algorithm able to discover effective gaits with a minimal number of physical trials. These results lend further credence to soft-robotic approaches that seek to harness the interaction of complex material dynamics to generate a wealth of dynamical behaviors.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Semi-supervised clustering methods.
Bair, Eric
2013-01-01
Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
Open source clustering software.
de Hoon, M J L; Imoto, S; Nolan, J; Miyano, S
2004-06-12
We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Segmentation of MRI Brain Images with an Improved Harmony Searching Algorithm.
Yang, Zhang; Shufan, Ye; Li, Guo; Weifeng, Ding
2016-01-01
The harmony searching (HS) algorithm is a kind of optimization search algorithm currently applied in many practical problems. The HS algorithm constantly revises variables in the harmony database and the probability of different values that can be used to complete iteration convergence to achieve the optimal effect. Accordingly, this study proposed a modified algorithm to improve the efficiency of the algorithm. First, a rough set algorithm was employed to improve the convergence and accuracy of the HS algorithm. Then, the optimal value was obtained using the improved HS algorithm. The optimal value of convergence was employed as the initial value of the fuzzy clustering algorithm for segmenting magnetic resonance imaging (MRI) brain images. Experimental results showed that the improved HS algorithm attained better convergence and more accurate results than those of the original HS algorithm. In our study, the MRI image segmentation effect of the improved algorithm was superior to that of the original fuzzy clustering method.