subspace clustering algorithm: Topics by Science.gov

Sample records for subspace clustering algorithm

Unsupervised spike sorting based on discriminative subspace learning.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2014-01-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Mining subspace clusters from DNA microarray data using large itemset techniques.

PubMed

Chang, Ye-In; Chen, Jiun-Rung; Tsai, Yueh-Chi

2009-05-01

Mining subspace clusters from the DNA microarrays could help researchers identify those genes which commonly contribute to a disease, where a subspace cluster indicates a subset of genes whose expression levels are similar under a subset of conditions. Since in a DNA microarray, the number of genes is far larger than the number of conditions, those previous proposed algorithms which compute the maximum dimension sets (MDSs) for any two genes will take a long time to mine subspace clusters. In this article, we propose the Large Itemset-Based Clustering (LISC) algorithm for mining subspace clusters. Instead of constructing MDSs for any two genes, we construct only MDSs for any two conditions. Then, we transform the task of finding the maximal possible gene sets into the problem of mining large itemsets from the condition-pair MDSs. Since we are only interested in those subspace clusters with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonable large support values in the condition-pair MDSs. From our simulation results, we show that the proposed algorithm needs shorter processing time than those previous proposed algorithms which need to construct gene-pair MDSs.
Subspace K-means clustering.

PubMed

Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

2013-12-01

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

NASA Astrophysics Data System (ADS)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain.

PubMed

Wolf, Antje; Kirschner, Karl N

2013-02-01

With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
Sparse subspace clustering for data with missing entries and high-rank matrix completion.

PubMed

Fan, Jicong; Chow, Tommy W S

2017-09-01

Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.

PubMed

Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai

2016-03-01

Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

NASA Astrophysics Data System (ADS)

Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

2016-10-01

Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.
Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering.

PubMed

Peng, Xi; Yu, Zhiding; Yi, Zhang; Tang, Huajin

2017-04-01

Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i.e., intrasubspace data points). Recent works achieve good performance by modeling errors into their objective functions to remove the errors from the inputs. However, these approaches face the limitations that the structure of errors should be known prior and a complex convex problem must be solved. In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. We first prove that l 1 -, l 2 -, l ∞ -, and nuclear-norm-based linear projection spaces share the property of intrasubspace projection dominance, i.e., the coefficients over intrasubspace data points are larger than those over intersubspace data points. Based on this property, we introduce a method to construct a sparse similarity graph, called L2-graph. The subspace clustering and subspace learning algorithms are developed upon L2-graph. We conduct comprehensive experiment on subspace learning, image clustering, and motion segmentation and consider several quantitative benchmarks classification/clustering accuracy, normalized mutual information, and running time. Results show that L2-graph outperforms many state-of-the-art methods in our experiments, including L1-graph, low rank representation (LRR), and latent LRR, least square regression, sparse subspace clustering, and locally linear representation.
Automated modal parameter estimation using correlation analysis and bootstrap sampling

NASA Astrophysics Data System (ADS)

Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

2018-02-01

The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Blind equalization and automatic modulation classification based on subspace for subcarrier MPSK optical communications

NASA Astrophysics Data System (ADS)

Chen, Dan; Guo, Lin-yuan; Wang, Chen-hao; Ke, Xi-zheng

2017-07-01

Equalization can compensate channel distortion caused by channel multipath effects, and effectively improve convergent of modulation constellation diagram in optical wireless system. In this paper, the subspace blind equalization algorithm is used to preprocess M-ary phase shift keying (MPSK) subcarrier modulation signal in receiver. Mountain clustering is adopted to get the clustering centers of MPSK modulation constellation diagram, and the modulation order is automatically identified through the k-nearest neighbor (KNN) classifier. The experiment has been done under four different weather conditions. Experimental results show that the convergent of constellation diagram is improved effectively after using the subspace blind equalization algorithm, which means that the accuracy of modulation recognition is increased. The correct recognition rate of 16PSK can be up to 85% in any kind of weather condition which is mentioned in paper. Meanwhile, the correct recognition rate is the highest in cloudy and the lowest in heavy rain condition.
Online Low-Rank Representation Learning for Joint Multi-subspace Recovery and Clustering.

PubMed

Li, Bo; Liu, Risheng; Cao, Junjie; Zhang, Jie; Lai, Yu-Kun; Liua, Xiuping

2017-10-06

Benefiting from global rank constraints, the lowrank representation (LRR) method has been shown to be an effective solution to subspace learning. However, the global mechanism also means that the LRR model is not suitable for handling large-scale data or dynamic data. For large-scale data, the LRR method suffers from high time complexity, and for dynamic data, it has to recompute a complex rank minimization for the entire data set whenever new samples are dynamically added, making it prohibitively expensive. Existing attempts to online LRR either take a stochastic approach or build the representation purely based on a small sample set and treat new input as out-of-sample data. The former often requires multiple runs for good performance and thus takes longer time to run, and the latter formulates online LRR as an out-ofsample classification problem and is less robust to noise. In this paper, a novel online low-rank representation subspace learning method is proposed for both large-scale and dynamic data. The proposed algorithm is composed of two stages: static learning and dynamic updating. In the first stage, the subspace structure is learned from a small number of data samples. In the second stage, the intrinsic principal components of the entire data set are computed incrementally by utilizing the learned subspace structure, and the low-rank representation matrix can also be incrementally solved by an efficient online singular value decomposition (SVD) algorithm. The time complexity is reduced dramatically for large-scale data, and repeated computation is avoided for dynamic problems. We further perform theoretical analysis comparing the proposed online algorithm with the batch LRR method. Finally, experimental results on typical tasks of subspace recovery and subspace clustering show that the proposed algorithm performs comparably or better than batch methods including the batch LRR, and significantly outperforms state-of-the-art online methods.
Fuzzy Subspace Clustering

NASA Astrophysics Data System (ADS)

Borgelt, Christian

In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

PubMed

Yu, Zhiwen; Chen, Hantao; You, Jane; Han, Guoqiang; Li, Le

2013-01-01

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
Greedy subspace clustering.

DOT National Transportation Integrated Search

2016-09-01

We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets ...
Seismic noise attenuation using an online subspace tracking algorithm

NASA Astrophysics Data System (ADS)

Zhou, Yatong; Li, Shuhua; Zhang, Dong; Chen, Yangkang

2018-02-01

We propose a new low-rank based noise attenuation method using an efficient algorithm for tracking subspaces from highly corrupted seismic observations. The subspace tracking algorithm requires only basic linear algebraic manipulations. The algorithm is derived by analysing incremental gradient descent on the Grassmannian manifold of subspaces. When the multidimensional seismic data are mapped to a low-rank space, the subspace tracking algorithm can be directly applied to the input low-rank matrix to estimate the useful signals. Since the subspace tracking algorithm is an online algorithm, it is more robust to random noise than traditional truncated singular value decomposition (TSVD) based subspace tracking algorithm. Compared with the state-of-the-art algorithms, the proposed denoising method can obtain better performance. More specifically, the proposed method outperforms the TSVD-based singular spectrum analysis method in causing less residual noise and also in saving half of the computational cost. Several synthetic and field data examples with different levels of complexities demonstrate the effectiveness and robustness of the presented algorithm in rejecting different types of noise including random noise, spiky noise, blending noise, and coherent noise.
A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.

PubMed

Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong

2015-12-01

Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.
A novel framework for feature extraction in multi-sensor action potential sorting.

PubMed

Wu, Shun-Chi; Swindlehurst, A Lee; Nenadic, Zoran

2015-09-30

Extracellular recordings of multi-unit neural activity have become indispensable in neuroscience research. The analysis of the recordings begins with the detection of the action potentials (APs), followed by a classification step where each AP is associated with a given neural source. A feature extraction step is required prior to classification in order to reduce the dimensionality of the data and the impact of noise, allowing source clustering algorithms to work more efficiently. In this paper, we propose a novel framework for multi-sensor AP feature extraction based on the so-called Matched Subspace Detector (MSD), which is shown to be a natural generalization of standard single-sensor algorithms. Clustering using both simulated data and real AP recordings taken in the locust antennal lobe demonstrates that the proposed approach yields features that are discriminatory and lead to promising results. Unlike existing methods, the proposed algorithm finds joint spatio-temporal feature vectors that match the dominant subspace observed in the two-dimensional data without needs for a forward propagation model and AP templates. The proposed MSD approach provides more discriminatory features for unsupervised AP sorting applications. Copyright © 2015 Elsevier B.V. All rights reserved.
Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

PubMed

Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

2012-04-04

Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

Dual signal subspace projection (DSSP): a novel algorithm for removing large interference in biomagnetic measurements

NASA Astrophysics Data System (ADS)

Sekihara, Kensuke; Kawabata, Yuya; Ushio, Shuta; Sumiya, Satoshi; Kawabata, Shigenori; Adachi, Yoshiaki; Nagarajan, Srikantan S.

2016-06-01

Objective. In functional electrophysiological imaging, signals are often contaminated by interference that can be of considerable magnitude compared to the signals of interest. This paper proposes a novel algorithm for removing such interferences that does not require separate noise measurements. Approach. The algorithm is based on a dual definition of the signal subspace in the spatial- and time-domains. Since the algorithm makes use of this duality, it is named the dual signal subspace projection (DSSP). The DSSP algorithm first projects the columns of the measured data matrix onto the inside and outside of the spatial-domain signal subspace, creating a set of two preprocessed data matrices. The intersection of the row spans of these two matrices is estimated as the time-domain interference subspace. The original data matrix is projected onto the subspace that is orthogonal to this interference subspace. Main results. The DSSP algorithm is validated by using the computer simulation, and using two sets of real biomagnetic data: spinal cord evoked field data measured from a healthy volunteer and magnetoencephalography data from a patient with a vagus nerve stimulator. Significance. The proposed DSSP algorithm is effective for removing overlapped interference in a wide variety of biomagnetic measurements.
Robust subspace clustering via joint weighted Schatten-p norm and Lq norm minimization

NASA Astrophysics Data System (ADS)

Zhang, Tao; Tang, Zhenmin; Liu, Qing

2017-05-01

Low-rank representation (LRR) has been successfully applied to subspace clustering. However, the nuclear norm in the standard LRR is not optimal for approximating the rank function in many real-world applications. Meanwhile, the L21 norm in LRR also fails to characterize various noises properly. To address the above issues, we propose an improved LRR method, which achieves low rank property via the new formulation with weighted Schatten-p norm and Lq norm (WSPQ). Specifically, the nuclear norm is generalized to be the Schatten-p norm and different weights are assigned to the singular values, and thus it can approximate the rank function more accurately. In addition, Lq norm is further incorporated into WSPQ to model different noises and improve the robustness. An efficient algorithm based on the inexact augmented Lagrange multiplier method is designed for the formulated problem. Extensive experiments on face clustering and motion segmentation clearly demonstrate the superiority of the proposed WSPQ over several state-of-the-art methods.
Formulating face verification with semidefinite programming.

PubMed

Yan, Shuicheng; Liu, Jianzhuang; Tang, Xiaoou; Huang, Thomas S

2007-11-01

This paper presents a unified solution to three unsolved problems existing in face verification with subspace learning techniques: selection of verification threshold, automatic determination of subspace dimension, and deducing feature fusing weights. In contrast to previous algorithms which search for the projection matrix directly, our new algorithm investigates a similarity metric matrix (SMM). With a certain verification threshold, this matrix is learned by a semidefinite programming approach, along with the constraints of the kindred pairs with similarity larger than the threshold, and inhomogeneous pairs with similarity smaller than the threshold. Then, the subspace dimension and the feature fusing weights are simultaneously inferred from the singular value decomposition of the derived SMM. In addition, the weighted and tensor extensions are proposed to further improve the algorithmic effectiveness and efficiency, respectively. Essentially, the verification is conducted within an affine subspace in this new algorithm and is, hence, called the affine subspace for verification (ASV). Extensive experiments show that the ASV can achieve encouraging face verification accuracy in comparison to other subspace algorithms, even without the need to explore any parameters.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Krause, Josua; Dasgupta, Aritra; Fekete, Jean-Daniel

Dealing with the curse of dimensionality is a key challenge in high-dimensional data visualization. We present SeekAView to address three main gaps in the existing research literature. First, automated methods like dimensionality reduction or clustering suffer from a lack of transparency in letting analysts interact with their outputs in real-time to suit their exploration strategies. The results often suffer from a lack of interpretability, especially for domain experts not trained in statistics and machine learning. Second, exploratory visualization techniques like scatter plots or parallel coordinates suffer from a lack of visual scalability: it is difficult to present a coherent overviewmore » of interesting combinations of dimensions. Third, the existing techniques do not provide a flexible workflow that allows for multiple perspectives into the analysis process by automatically detecting and suggesting potentially interesting subspaces. In SeekAView we address these issues using suggestion based visual exploration of interesting patterns for building and refining multidimensional subspaces. Compared to the state-of-the-art in subspace search and visualization methods, we achieve higher transparency in showing not only the results of the algorithms, but also interesting dimensions calibrated against different metrics. We integrate a visually scalable design space with an iterative workflow guiding the analysts by choosing the starting points and letting them slice and dice through the data to find interesting subspaces and detect correlations, clusters, and outliers. We present two usage scenarios for demonstrating how SeekAView can be applied in real-world data analysis scenarios.« less
The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation

NASA Astrophysics Data System (ADS)

Kussainov, A. S.

2017-12-01

The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.
A hyperspectral imagery anomaly detection algorithm based on local three-dimensional orthogonal subspace projection

NASA Astrophysics Data System (ADS)

Zhang, Xing; Wen, Gongjian

2015-10-01

Anomaly detection (AD) becomes increasingly important in hyperspectral imagery analysis with many practical applications. Local orthogonal subspace projection (LOSP) detector is a popular anomaly detector which exploits local endmembers/eigenvectors around the pixel under test (PUT) to construct background subspace. However, this subspace only takes advantage of the spectral information, but the spatial correlat ion of the background clutter is neglected, which leads to the anomaly detection result sensitive to the accuracy of the estimated subspace. In this paper, a local three dimensional orthogonal subspace projection (3D-LOSP) algorithm is proposed. Firstly, under the jointly use of both spectral and spatial information, three directional background subspaces are created along the image height direction, the image width direction and the spectral direction, respectively. Then, the three corresponding orthogonal subspaces are calculated. After that, each vector along three direction of the local cube is projected onto the corresponding orthogonal subspace. Finally, a composite score is given through the three direction operators. In 3D-LOSP, the anomalies are redefined as the target not only spectrally different to the background, but also spatially distinct. Thanks to the addition of the spatial information, the robustness of the anomaly detection result has been improved greatly by the proposed 3D-LOSP algorithm. It is noteworthy that the proposed algorithm is an expansion of LOSP and this ideology can inspire many other spectral-based anomaly detection methods. Experiments with real hyperspectral images have proved the stability of the detection result.
Joint fMRI analysis and subject clustering using sparse dictionary learning

NASA Astrophysics Data System (ADS)

Kim, Seung-Jun; Dontaraju, Krishna K.

2017-08-01

Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.
Identifying Optimal Measurement Subspace for the Ensemble Kalman Filter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Ning; Huang, Zhenyu; Welch, Greg

2012-05-24

To reduce the computational load of the ensemble Kalman filter while maintaining its efficacy, an optimization algorithm based on the generalized eigenvalue decomposition method is proposed for identifying the most informative measurement subspace. When the number of measurements is large, the proposed algorithm can be used to make an effective tradeoff between computational complexity and estimation accuracy. This algorithm also can be extended to other Kalman filters for measurement subspace selection.
General subspace learning with corrupted training data via graph embedding.

PubMed

Bao, Bing-Kun; Liu, Guangcan; Hong, Richang; Yan, Shuicheng; Xu, Changsheng

2013-11-01

We address the following subspace learning problem: supposing we are given a set of labeled, corrupted training data points, how to learn the underlying subspace, which contains three components: an intrinsic subspace that captures certain desired properties of a data set, a penalty subspace that fits the undesired properties of the data, and an error container that models the gross corruptions possibly existing in the data. Given a set of data points, these three components can be learned by solving a nuclear norm regularized optimization problem, which is convex and can be efficiently solved in polynomial time. Using the method as a tool, we propose a new discriminant analysis (i.e., supervised subspace learning) algorithm called Corruptions Tolerant Discriminant Analysis (CTDA), in which the intrinsic subspace is used to capture the features with high within-class similarity, the penalty subspace takes the role of modeling the undesired features with high between-class similarity, and the error container takes charge of fitting the possible corruptions in the data. We show that CTDA can well handle the gross corruptions possibly existing in the training data, whereas previous linear discriminant analysis algorithms arguably fail in such a setting. Extensive experiments conducted on two benchmark human face data sets and one object recognition data set show that CTDA outperforms the related algorithms.
Human Guidance Behavior Decomposition and Modeling

NASA Astrophysics Data System (ADS)

Feit, Andrew James

Trained humans are capable of high performance, adaptable, and robust first-person dynamic motion guidance behavior. This behavior is exhibited in a wide variety of activities such as driving, piloting aircraft, skiing, biking, and many others. Human performance in such activities far exceeds the current capability of autonomous systems in terms of adaptability to new tasks, real-time motion planning, robustness, and trading safety for performance. The present work investigates the structure of human dynamic motion guidance that enables these performance qualities. This work uses a first-person experimental framework that presents a driving task to the subject, measuring control inputs, vehicle motion, and operator visual gaze movement. The resulting data is decomposed into subspace segment clusters that form primitive elements of action-perception interactive behavior. Subspace clusters are defined by both agent-environment system dynamic constraints and operator control strategies. A key contribution of this work is to define transitions between subspace cluster segments, or subgoals, as points where the set of active constraints, either system or operator defined, changes. This definition provides necessary conditions to determine transition points for a given task-environment scenario that allow a solution trajectory to be planned from known behavior elements. In addition, human gaze behavior during this task contains predictive behavior elements, indicating that the identified control modes are internally modeled. Based on these ideas, a generative, autonomous guidance framework is introduced that efficiently generates optimal dynamic motion behavior in new tasks. The new subgoal planning algorithm is shown to generate solutions to certain tasks more quickly than existing approaches currently used in robotics.
Time-oriented hierarchical method for computation of principal components using subspace learning algorithm.

PubMed

Jankovic, Marko; Ogawa, Hidemitsu

2004-10-01

Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method.
On iterative processes in the Krylov-Sonneveld subspaces

NASA Astrophysics Data System (ADS)

Ilin, Valery P.

2016-10-01

The iterative Induced Dimension Reduction (IDR) methods are considered for solving large systems of linear algebraic equations (SLAEs) with nonsingular nonsymmetric matrices. These approaches are investigated by many authors and are charachterized sometimes as the alternative to the classical processes of Krylov type. The key moments of the IDR algorithms consist in the construction of the embedded Sonneveld subspaces, which have the decreasing dimensions and use the orthogonalization to some fixed subspace. Other independent approaches for research and optimization of the iterations are based on the augmented and modified Krylov subspaces by using the aggregation and deflation procedures with present various low rank approximations of the original matrices. The goal of this paper is to show, that IDR method in Sonneveld subspaces present an original interpretation of the modified algorithms in the Krylov subspaces. In particular, such description is given for the multi-preconditioned semi-conjugate direction methods which are actual for the parallel algebraic domain decomposition approaches.
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
Multiple Source DF (Direction Finding) Signal Processing: An Experimental System,

DTIC Science & Technology

The MUltiple SIgnal Characterization ( MUSIC ) algorithm is an implementation of the Signal Subspace Approach to provide parameter estimates of...the signal subspace (obtained from the received data) and the array manifold (obtained via array calibration). The MUSIC algorithm has been
Fast divide-and-conquer algorithm for evaluating polarization in classical force fields

NASA Astrophysics Data System (ADS)

Nocito, Dominique; Beran, Gregory J. O.

2017-03-01

Evaluation of the self-consistent polarization energy forms a major computational bottleneck in polarizable force fields. In large systems, the linear polarization equations are typically solved iteratively with techniques based on Jacobi iterations (JI) or preconditioned conjugate gradients (PCG). Two new variants of JI are proposed here that exploit domain decomposition to accelerate the convergence of the induced dipoles. The first, divide-and-conquer JI (DC-JI), is a block Jacobi algorithm which solves the polarization equations within non-overlapping sub-clusters of atoms directly via Cholesky decomposition, and iterates to capture interactions between sub-clusters. The second, fuzzy DC-JI, achieves further acceleration by employing overlapping blocks. Fuzzy DC-JI is analogous to an additive Schwarz method, but with distance-based weighting when averaging the fuzzy dipoles from different blocks. Key to the success of these algorithms is the use of K-means clustering to identify natural atomic sub-clusters automatically for both algorithms and to determine the appropriate weights in fuzzy DC-JI. The algorithm employs knowledge of the 3-D spatial interactions to group important elements in the 2-D polarization matrix. When coupled with direct inversion in the iterative subspace (DIIS) extrapolation, fuzzy DC-JI/DIIS in particular converges in a comparable number of iterations as PCG, but with lower computational cost per iteration. In the end, the new algorithms demonstrated here accelerate the evaluation of the polarization energy by 2-3 fold compared to existing implementations of PCG or JI/DIIS.
Subspace-based interference removal methods for a multichannel biomagnetic sensor array.

PubMed

Sekihara, Kensuke; Nagarajan, Srikantan S

2017-10-01

In biomagnetic signal processing, the theory of the signal subspace has been applied to removing interfering magnetic fields, and a representative algorithm is the signal space projection algorithm, in which the signal/interference subspace is defined in the spatial domain as the span of signal/interference-source lead field vectors. This paper extends the notion of this conventional (spatial domain) signal subspace by introducing a new definition of signal subspace in the time domain. It defines the time-domain signal subspace as the span of row vectors that contain the source time course values. This definition leads to symmetric relationships between the time-domain and the conventional (spatial-domain) signal subspaces. As a review, this article shows that the notion of the time-domain signal subspace provides useful insights over existing interference removal methods from a unified perspective. Main results and significance. Using the time-domain signal subspace, it is possible to interpret a number of interference removal methods as the time domain signal space projection. Such methods include adaptive noise canceling, sensor noise suppression, the common temporal subspace projection, the spatio-temporal signal space separation, and the recently-proposed dual signal subspace projection. Our analysis using the notion of the time domain signal space projection reveals implicit assumptions these methods rely on, and shows that the difference between these methods results only from the manner of deriving the interference subspace. Numerical examples that illustrate the results of our arguments are provided.
Subspace-based interference removal methods for a multichannel biomagnetic sensor array

NASA Astrophysics Data System (ADS)

Sekihara, Kensuke; Nagarajan, Srikantan S.

2017-10-01

Objective. In biomagnetic signal processing, the theory of the signal subspace has been applied to removing interfering magnetic fields, and a representative algorithm is the signal space projection algorithm, in which the signal/interference subspace is defined in the spatial domain as the span of signal/interference-source lead field vectors. This paper extends the notion of this conventional (spatial domain) signal subspace by introducing a new definition of signal subspace in the time domain. Approach. It defines the time-domain signal subspace as the span of row vectors that contain the source time course values. This definition leads to symmetric relationships between the time-domain and the conventional (spatial-domain) signal subspaces. As a review, this article shows that the notion of the time-domain signal subspace provides useful insights over existing interference removal methods from a unified perspective. Main results and significance. Using the time-domain signal subspace, it is possible to interpret a number of interference removal methods as the time domain signal space projection. Such methods include adaptive noise canceling, sensor noise suppression, the common temporal subspace projection, the spatio-temporal signal space separation, and the recently-proposed dual signal subspace projection. Our analysis using the notion of the time domain signal space projection reveals implicit assumptions these methods rely on, and shows that the difference between these methods results only from the manner of deriving the interference subspace. Numerical examples that illustrate the results of our arguments are provided.
Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

PubMed

Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

2014-05-01

This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Multigrid techniques for nonlinear eigenvalue probems: Solutions of a nonlinear Schroedinger eigenvalue problem in 2D and 3D

NASA Technical Reports Server (NTRS)

Costiner, Sorin; Taasan, Shlomo

1994-01-01

This paper presents multigrid (MG) techniques for nonlinear eigenvalue problems (EP) and emphasizes an MG algorithm for a nonlinear Schrodinger EP. The algorithm overcomes the mentioned difficulties combining the following techniques: an MG projection coupled with backrotations for separation of solutions and treatment of difficulties related to clusters of close and equal eigenvalues; MG subspace continuation techniques for treatment of the nonlinearity; an MG simultaneous treatment of the eigenvectors at the same time with the nonlinearity and with the global constraints. The simultaneous MG techniques reduce the large number of self consistent iterations to only a few or one MG simultaneous iteration and keep the solutions in a right neighborhood where the algorithm converges fast.
Subspace Clustering via Learning an Adaptive Low-Rank Graph.

PubMed

Yin, Ming; Xie, Shengli; Wu, Zongze; Zhang, Yun; Gao, Junbin

2018-08-01

By using a sparse representation or low-rank representation of data, the graph-based subspace clustering has recently attracted considerable attention in computer vision, given its capability and efficiency in clustering data. However, the graph weights built using the representation coefficients are not the exact ones as the traditional definition is in a deterministic way. The two steps of representation and clustering are conducted in an independent manner, thus an overall optimal result cannot be guaranteed. Furthermore, it is unclear how the clustering performance will be affected by using this graph. For example, the graph parameters, i.e., the weights on edges, have to be artificially pre-specified while it is very difficult to choose the optimum. To this end, in this paper, a novel subspace clustering via learning an adaptive low-rank graph affinity matrix is proposed, where the affinity matrix and the representation coefficients are learned in a unified framework. As such, the pre-computed graph regularizer is effectively obviated and better performance can be achieved. Experimental results on several famous databases demonstrate that the proposed method performs better against the state-of-the-art approaches, in clustering.

EEG and MEG source localization using recursively applied (RAP) MUSIC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosher, J.C.; Leahy, R.M.

1996-12-31

The multiple signal characterization (MUSIC) algorithm locates multiple asynchronous dipolar sources from electroencephalography (EEG) and magnetoencephalography (MEG) data. A signal subspace is estimated from the data, then the algorithm scans a single dipole model through a three-dimensional head volume and computes projections onto this subspace. To locate the sources, the user must search the head volume for local peaks in the projection metric. Here we describe a novel extension of this approach which we refer to as RAP (Recursively APplied) MUSIC. This new procedure automatically extracts the locations of the sources through a recursive use of subspace projections, which usesmore » the metric of principal correlations as a multidimensional form of correlation analysis between the model subspace and the data subspace. The dipolar orientations, a form of `diverse polarization,` are easily extracted using the associated principal vectors.« less
Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions

DTIC Science & Technology

2012-03-22

with performance profiles, Math. Program., 91 (2002), pp. 201–213. [6] P. DRINEAS, R. KANNAN, AND M. W. MAHONEY , Fast Monte Carlo algorithms for matrices...computing invariant subspaces of non-Hermitian matri- ces, Numer. Math., 25 ( 1975 /76), pp. 123–136. [25] , Matrix algorithms Vol. II: Eigensystems
Application of higher order SVD to vibration-based system identification and damage detection

NASA Astrophysics Data System (ADS)

Chao, Shu-Hsien; Loh, Chin-Hsiung; Weng, Jian-Huang

2012-04-01

Singular value decomposition (SVD) is a powerful linear algebra tool. It is widely used in many different signal processing methods, such principal component analysis (PCA), singular spectrum analysis (SSA), frequency domain decomposition (FDD), subspace identification and stochastic subspace identification method ( SI and SSI ). In each case, the data is arranged appropriately in matrix form and SVD is used to extract the feature of the data set. In this study three different algorithms on signal processing and system identification are proposed: SSA, SSI-COV and SSI-DATA. Based on the extracted subspace and null-space from SVD of data matrix, damage detection algorithms can be developed. The proposed algorithm is used to process the shaking table test data of the 6-story steel frame. Features contained in the vibration data are extracted by the proposed method. Damage detection can then be investigated from the test data of the frame structure through subspace-based and nullspace-based damage indices.
An efficient linear-scaling CCSD(T) method based on local natural orbitals.

PubMed

Rolik, Zoltán; Szegedy, Lóránt; Ladjánszki, István; Ladóczki, Bence; Kállay, Mihály

2013-09-07

An improved version of our general-order local coupled-cluster (CC) approach [Z. Rolik and M. Kállay, J. Chem. Phys. 135, 104111 (2011)] and its efficient implementation at the CC singles and doubles with perturbative triples [CCSD(T)] level is presented. The method combines the cluster-in-molecule approach of Li and co-workers [J. Chem. Phys. 131, 114109 (2009)] with frozen natural orbital (NO) techniques. To break down the unfavorable fifth-power scaling of our original approach a two-level domain construction algorithm has been developed. First, an extended domain of localized molecular orbitals (LMOs) is assembled based on the spatial distance of the orbitals. The necessary integrals are evaluated and transformed in these domains invoking the density fitting approximation. In the second step, for each occupied LMO of the extended domain a local subspace of occupied and virtual orbitals is constructed including approximate second-order Mo̸ller-Plesset NOs. The CC equations are solved and the perturbative corrections are calculated in the local subspace for each occupied LMO using a highly-efficient CCSD(T) code, which was optimized for the typical sizes of the local subspaces. The total correlation energy is evaluated as the sum of the individual contributions. The computation time of our approach scales linearly with the system size, while its memory and disk space requirements are independent thereof. Test calculations demonstrate that currently our method is one of the most efficient local CCSD(T) approaches and can be routinely applied to molecules of up to 100 atoms with reasonable basis sets.
Subarray Processing for Projection-based RFI Mitigation in Radio Astronomical Interferometers

NASA Astrophysics Data System (ADS)

Burnett, Mitchell C.; Jeffs, Brian D.; Black, Richard A.; Warnick, Karl F.

2018-04-01

Radio Frequency Interference (RFI) is a major problem for observations in Radio Astronomy (RA). Adaptive spatial filtering techniques such as subspace projection are promising candidates for RFI mitigation; however, for radio interferometric imaging arrays, these have primarily been used in engineering demonstration experiments rather than mainstream scientific observations. This paper considers one reason that adoption of such algorithms is limited: RFI decorrelates across the interferometric array because of long baseline lengths. This occurs when the relative RFI time delay along a baseline is large compared to the frequency channel inverse bandwidth used in the processing chain. Maximum achievable excision of the RFI is limited by covariance matrix estimation error when identifying interference subspace parameters, and decorrelation of the RFI introduces errors that corrupt the subspace estimate, rendering subspace projection ineffective over the entire array. In this work, we present an algorithm that overcomes this challenge of decorrelation by applying subspace projection via subarray processing (SP-SAP). Each subarray is designed to have a set of elements with high mutual correlation in the interferer for better estimation of subspace parameters. In an RFI simulation scenario for the proposed ngVLA interferometric imaging array with 15 kHz channel bandwidth for correlator processing, we show that compared to the former approach of applying subspace projection on the full array, SP-SAP improves mitigation of the RFI on the order of 9 dB. An example of improved image synthesis and reduced RFI artifacts for a simulated image “phantom” using the SP-SAP algorithm is presented.
Mode Shape Estimation Algorithms Under Ambient Conditions: A Comparative Review

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dosiek, Luke; Zhou, Ning; Pierre, John W.

Abstract—This paper provides a comparative review of five existing ambient electromechanical mode shape estimation algorithms, i.e., the Transfer Function (TF), Spectral, Frequency Domain Decomposition (FDD), Channel Matching, and Subspace Methods. It is also shown that the TF Method is a general approach to estimating mode shape and that the Spectral, FDD, and Channel Matching Methods are actually special cases of it. Additionally, some of the variations of the Subspace Method are reviewed and the Numerical algorithm for Subspace State Space System IDentification (N4SID) is implemented. The five algorithms are then compared using data simulated from a 17-machine model of themore » Western Electricity Coordinating Council (WECC) under ambient conditions with both low and high damping, as well as during the case where ambient data is disrupted by an oscillatory ringdown. The performance of the algorithms is compared using the statistics from Monte Carlo Simulations and results from measured WECC data, and a discussion of the practical issues surrounding their implementation, including cases where power system probing is an option, is provided. The paper concludes with some recommendations as to the appropriate use of the various techniques. Index Terms—Electromechanical mode shape, small-signal stability, phasor measurement units (PMU), system identification, N4SID, subspace.« less
Target detection using the background model from the topological anomaly detection algorithm

NASA Astrophysics Data System (ADS)

Dorado Munoz, Leidy P.; Messinger, David W.; Ziemann, Amanda K.

2013-05-01

The Topological Anomaly Detection (TAD) algorithm has been used as an anomaly detector in hyperspectral and multispectral images. TAD is an algorithm based on graph theory that constructs a topological model of the background in a scene, and computes an anomalousness ranking for all of the pixels in the image with respect to the background in order to identify pixels with uncommon or strange spectral signatures. The pixels that are modeled as background are clustered into groups or connected components, which could be representative of spectral signatures of materials present in the background. Therefore, the idea of using the background components given by TAD in target detection is explored in this paper. In this way, these connected components are characterized in three different approaches, where the mean signature and endmembers for each component are calculated and used as background basis vectors in Orthogonal Subspace Projection (OSP) and Adaptive Subspace Detector (ASD). Likewise, the covariance matrix of those connected components is estimated and used in detectors: Constrained Energy Minimization (CEM) and Adaptive Coherence Estimator (ACE). The performance of these approaches and the different detectors is compared with a global approach, where the background characterization is derived directly from the image. Experiments and results using self-test data set provided as part of the RIT blind test target detection project are shown.
Manifold learning-based subspace distance for machinery damage assessment

NASA Astrophysics Data System (ADS)

Sun, Chuang; Zhang, Zhousuo; He, Zhengjia; Shen, Zhongjie; Chen, Binqiang

2016-03-01

Damage assessment is very meaningful to keep safety and reliability of machinery components, and vibration analysis is an effective way to carry out the damage assessment. In this paper, a damage index is designed by performing manifold distance analysis on vibration signal. To calculate the index, vibration signal is collected firstly, and feature extraction is carried out to obtain statistical features that can capture signal characteristics comprehensively. Then, manifold learning algorithm is utilized to decompose feature matrix to be a subspace, that is, manifold subspace. The manifold learning algorithm seeks to keep local relationship of the feature matrix, which is more meaningful for damage assessment. Finally, Grassmann distance between manifold subspaces is defined as a damage index. The Grassmann distance reflecting manifold structure is a suitable metric to measure distance between subspaces in the manifold. The defined damage index is applied to damage assessment of a rotor and the bearing, and the result validates its effectiveness for damage assessment of machinery component.
Globally convergent techniques in nonlinear Newton-Krylov

NASA Technical Reports Server (NTRS)

Brown, Peter N.; Saad, Youcef

1989-01-01

Some convergence theory is presented for nonlinear Krylov subspace methods. The basic idea of these methods is to use variants of Newton's iteration in conjunction with a Krylov subspace method for solving the Jacobian linear systems. These methods are variants of inexact Newton methods where the approximate Newton direction is taken from a subspace of small dimensions. The main focus is to analyze these methods when they are combined with global strategies such as linesearch techniques and model trust region algorithms. Most of the convergence results are formulated for projection onto general subspaces rather than just Krylov subspaces.
Anomaly Detection in Moving-Camera Video Sequences Using Principal Subspace Analysis

DOE PAGES

Thomaz, Lucas A.; Jardim, Eric; da Silva, Allan F.; ...

2017-10-16

This study presents a family of algorithms based on sparse decompositions that detect anomalies in video sequences obtained from slow moving cameras. These algorithms start by computing the union of subspaces that best represents all the frames from a reference (anomaly free) video as a low-rank projection plus a sparse residue. Then, they perform a low-rank representation of a target (possibly anomalous) video by taking advantage of both the union of subspaces and the sparse residue computed from the reference video. Such algorithms provide good detection results while at the same time obviating the need for previous video synchronization. However,more » this is obtained at the cost of a large computational complexity, which hinders their applicability. Another contribution of this paper approaches this problem by using intrinsic properties of the obtained data representation in order to restrict the search space to the most relevant subspaces, providing computational complexity gains of up to two orders of magnitude. The developed algorithms are shown to cope well with videos acquired in challenging scenarios, as verified by the analysis of 59 videos from the VDAO database that comprises videos with abandoned objects in a cluttered industrial scenario.« less
Anomaly Detection in Moving-Camera Video Sequences Using Principal Subspace Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomaz, Lucas A.; Jardim, Eric; da Silva, Allan F.

This study presents a family of algorithms based on sparse decompositions that detect anomalies in video sequences obtained from slow moving cameras. These algorithms start by computing the union of subspaces that best represents all the frames from a reference (anomaly free) video as a low-rank projection plus a sparse residue. Then, they perform a low-rank representation of a target (possibly anomalous) video by taking advantage of both the union of subspaces and the sparse residue computed from the reference video. Such algorithms provide good detection results while at the same time obviating the need for previous video synchronization. However,more » this is obtained at the cost of a large computational complexity, which hinders their applicability. Another contribution of this paper approaches this problem by using intrinsic properties of the obtained data representation in order to restrict the search space to the most relevant subspaces, providing computational complexity gains of up to two orders of magnitude. The developed algorithms are shown to cope well with videos acquired in challenging scenarios, as verified by the analysis of 59 videos from the VDAO database that comprises videos with abandoned objects in a cluttered industrial scenario.« less
Adaptive bearing estimation and tracking of multiple targets in a realistic passive sonar scenario

NASA Astrophysics Data System (ADS)

Rajagopal, R.; Challa, Subhash; Faruqi, Farhan A.; Rao, P. R.

1997-06-01

In a realistic passive sonar environment, the received signal consists of multipath arrivals from closely separated moving targets. The signals are contaminated by spatially correlated noise. The differential MUSIC has been proposed to estimate the DOAs in such a scenario. This method estimates the 'noise subspace' in order to estimate the DOAs. However, the 'noise subspace' estimate has to be updated as and when new data become available. In order to save the computational costs, a new adaptive noise subspace estimation algorithm is proposed in this paper. The salient features of the proposed algorithm are: (1) Noise subspace estimation is done by QR decomposition of the difference matrix which is formed from the data covariance matrix. Thus, as compared to standard eigen-decomposition based methods which require O(N3) computations, the proposed method requires only O(N2) computations. (2) Noise subspace is updated by updating the QR decomposition. (3) The proposed algorithm works in a realistic sonar environment. In the second part of the paper, the estimated bearing values are used to track multiple targets. In order to achieve this, the nonlinear system/linear measurement extended Kalman filtering proposed is applied. Computer simulation results are also presented to support the theory.
Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

PubMed

Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

2015-12-01

One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .
Slope angle estimation method based on sparse subspace clustering for probe safe landing

NASA Astrophysics Data System (ADS)

Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui

2018-06-01

To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.
A General Algorithm for Reusing Krylov Subspace Information. I. Unsteady Navier-Stokes

NASA Technical Reports Server (NTRS)

Carpenter, Mark H.; Vuik, C.; Lucas, Peter; vanGijzen, Martin; Bijl, Hester

2010-01-01

A general algorithm is developed that reuses available information to accelerate the iterative convergence of linear systems with multiple right-hand sides A x = b (sup i), which are commonly encountered in steady or unsteady simulations of nonlinear equations. The algorithm is based on the classical GMRES algorithm with eigenvector enrichment but also includes a Galerkin projection preprocessing step and several novel Krylov subspace reuse strategies. The new approach is applied to a set of test problems, including an unsteady turbulent airfoil, and is shown in some cases to provide significant improvement in computational efficiency relative to baseline approaches.
Subspace-Aware Index Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kailkhura, Bhavya; Theagarajan, Lakshmi Narasimhan; Varshney, Pramod K.

In this paper, we generalize the well-known index coding problem to exploit the structure in the source-data to improve system throughput. In many applications (e.g., multimedia), the data to be transmitted may lie (or can be well approximated) in a low-dimensional subspace. We exploit this low-dimensional structure of the data using an algebraic framework to solve the index coding problem (referred to as subspace-aware index coding) as opposed to the traditional index coding problem which is subspace-unaware. Also, we propose an efficient algorithm based on the alternating minimization approach to obtain near optimal index codes for both subspace-aware and -unawaremore » cases. In conclusion, our simulations indicate that under certain conditions, a significant throughput gain (about 90%) can be achieved by subspace-aware index codes over conventional subspace-unaware index codes.« less
Subspace-Aware Index Codes

DOE PAGES

Kailkhura, Bhavya; Theagarajan, Lakshmi Narasimhan; Varshney, Pramod K.

2017-04-12

In this paper, we generalize the well-known index coding problem to exploit the structure in the source-data to improve system throughput. In many applications (e.g., multimedia), the data to be transmitted may lie (or can be well approximated) in a low-dimensional subspace. We exploit this low-dimensional structure of the data using an algebraic framework to solve the index coding problem (referred to as subspace-aware index coding) as opposed to the traditional index coding problem which is subspace-unaware. Also, we propose an efficient algorithm based on the alternating minimization approach to obtain near optimal index codes for both subspace-aware and -unawaremore » cases. In conclusion, our simulations indicate that under certain conditions, a significant throughput gain (about 90%) can be achieved by subspace-aware index codes over conventional subspace-unaware index codes.« less
A Poisson nonnegative matrix factorization method with parameter subspace clustering constraint for endmember extraction in hyperspectral imagery

NASA Astrophysics Data System (ADS)

Sun, Weiwei; Ma, Jun; Yang, Gang; Du, Bo; Zhang, Liangpei

2017-06-01

A new Bayesian method named Poisson Nonnegative Matrix Factorization with Parameter Subspace Clustering Constraint (PNMF-PSCC) has been presented to extract endmembers from Hyperspectral Imagery (HSI). First, the method integrates the liner spectral mixture model with the Bayesian framework and it formulates endmember extraction into a Bayesian inference problem. Second, the Parameter Subspace Clustering Constraint (PSCC) is incorporated into the statistical program to consider the clustering of all pixels in the parameter subspace. The PSCC could enlarge differences among ground objects and helps finding endmembers with smaller spectrum divergences. Meanwhile, the PNMF-PSCC method utilizes the Poisson distribution as the prior knowledge of spectral signals to better explain the quantum nature of light in imaging spectrometer. Third, the optimization problem of PNMF-PSCC is formulated into maximizing the joint density via the Maximum A Posterior (MAP) estimator. The program is finally solved by iteratively optimizing two sub-problems via the Alternating Direction Method of Multipliers (ADMM) framework and the FURTHESTSUM initialization scheme. Five state-of-the art methods are implemented to make comparisons with the performance of PNMF-PSCC on both the synthetic and real HSI datasets. Experimental results show that the PNMF-PSCC outperforms all the five methods in Spectral Angle Distance (SAD) and Root-Mean-Square-Error (RMSE), and especially it could identify good endmembers for ground objects with smaller spectrum divergences.
Cluster state generation in one-dimensional Kitaev honeycomb model via shortcut to adiabaticity

NASA Astrophysics Data System (ADS)

Kyaw, Thi Ha; Kwek, Leong-Chuan

2018-04-01

We propose a mean to obtain computationally useful resource states also known as cluster states, for measurement-based quantum computation, via transitionless quantum driving algorithm. The idea is to cool the system to its unique ground state and tune some control parameters to arrive at computationally useful resource state, which is in one of the degenerate ground states. Even though there is set of conserved quantities already present in the model Hamiltonian, which prevents the instantaneous state to go to any other eigenstate subspaces, one cannot quench the control parameters to get the desired state. In that case, the state will not evolve. With involvement of the shortcut Hamiltonian, we obtain cluster states in fast-forward manner. We elaborate our proposal in the one-dimensional Kitaev honeycomb model, and show that the auxiliary Hamiltonian needed for the counterdiabatic driving is of M-body interaction.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosher, J.C.; Leahy, R.M.

A new method for source localization is described that is based on a modification of the well known multiple signal classification (MUSIC) algorithm. In classical MUSIC, the array manifold vector is projected onto an estimate of the signal subspace, but errors in the estimate can make location of multiple sources difficult. Recursively applied and projected (RAP) MUSIC uses each successively located source to form an intermediate array gain matrix, and projects both the array manifold and the signal subspace estimate into its orthogonal complement. The MUSIC projection is then performed in this reduced subspace. Using the metric of principal angles,more » the authors describe a general form of the RAP-MUSIC algorithm for the case of diversely polarized sources. Through a uniform linear array simulation, the authors demonstrate the improved Monte Carlo performance of RAP-MUSIC relative to MUSIC and two other sequential subspace methods, S and IES-MUSIC.« less

ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
An Eccentricity Based Data Routing Protocol with Uniform Node Distribution in 3D WSN.

PubMed

Hosen, A S M Sanwar; Cho, Gi Hwan; Ra, In-Ho

2017-09-16

Due to nonuniform node distribution, the energy consumption of nodes are imbalanced in clustering-based wireless sensor networks (WSNs). It might have more impact when nodes are deployed in a three-dimensional (3D) environment. In this regard, we propose the eccentricity based data routing (EDR) protocol in a 3D WSN with uniform node distribution. It includes network partitions called 3D subspaces/clusters of equal member nodes, an energy-efficient routing centroid (RC) nodes election and data routing algorithm. The RC nodes election conducts in a quasi-static nature until a certain period unlike the periodic cluster heads election of typical clustering-based routing. It not only reduces the energy consumption of nodes during the election phase, but also in intra-communication. At the same time, the routing algorithm selects a forwarding node in such a way that balances the energy consumption among RC nodes and reduces the number of hops towards the sink. The simulation results validate and ensure the performance supremacy of the EDR protocol compared to existing protocols in terms of various metrics such as steady state and network lifetime in particular. Meanwhile, the results show the EDR is more robust in uniform node distribution compared to nonuniform.
An Eccentricity Based Data Routing Protocol with Uniform Node Distribution in 3D WSN

PubMed Central

Hosen, A. S. M. Sanwar; Cho, Gi Hwan; Ra, In-Ho

2017-01-01

Due to nonuniform node distribution, the energy consumption of nodes are imbalanced in clustering-based wireless sensor networks (WSNs). It might have more impact when nodes are deployed in a three-dimensional (3D) environment. In this regard, we propose the eccentricity based data routing (EDR) protocol in a 3D WSN with uniform node distribution. It includes network partitions called 3D subspaces/clusters of equal member nodes, an energy-efficient routing centroid (RC) nodes election and data routing algorithm. The RC nodes election conducts in a quasi-static nature until a certain period unlike the periodic cluster heads election of typical clustering-based routing. It not only reduces the energy consumption of nodes during the election phase, but also in intra-communication. At the same time, the routing algorithm selects a forwarding node in such a way that balances the energy consumption among RC nodes and reduces the number of hops towards the sink. The simulation results validate and ensure the performance supremacy of the EDR protocol compared to existing protocols in terms of various metrics such as steady state and network lifetime in particular. Meanwhile, the results show the EDR is more robust in uniform node distribution compared to nonuniform. PMID:28926958
Visual tracking based on the sparse representation of the PCA subspace

NASA Astrophysics Data System (ADS)

Chen, Dian-bing; Zhu, Ming; Wang, Hui-li

2017-09-01

We construct a collaborative model of the sparse representation and the subspace representation. First, we represent the tracking target in the principle component analysis (PCA) subspace, and then we employ an L 1 regularization to restrict the sparsity of the residual term, an L 2 regularization term to restrict the sparsity of the representation coefficients, and an L 2 norm to restrict the distance between the reconstruction and the target. Then we implement the algorithm in the particle filter framework. Furthermore, an iterative method is presented to get the global minimum of the residual and the coefficients. Finally, an alternative template update scheme is adopted to avoid the tracking drift which is caused by the inaccurate update. In the experiment, we test the algorithm on 9 sequences, and compare the results with 5 state-of-art methods. According to the results, we can conclude that our algorithm is more robust than the other methods.
Improving M-SBL for Joint Sparse Recovery Using a Subspace Penalty

NASA Astrophysics Data System (ADS)

Ye, Jong Chul; Kim, Jong Min; Bresler, Yoram

2015-12-01

The multiple measurement vector problem (MMV) is a generalization of the compressed sensing problem that addresses the recovery of a set of jointly sparse signal vectors. One of the important contributions of this paper is to reveal that the seemingly least related state-of-art MMV joint sparse recovery algorithms - M-SBL (multiple sparse Bayesian learning) and subspace-based hybrid greedy algorithms - have a very important link. More specifically, we show that replacing the $\\log\\det(\\cdot)$ term in M-SBL by a rank proxy that exploits the spark reduction property discovered in subspace-based joint sparse recovery algorithms, provides significant improvements. In particular, if we use the Schatten-$p$ quasi-norm as the corresponding rank proxy, the global minimiser of the proposed algorithm becomes identical to the true solution as $p \\rightarrow 0$. Furthermore, under the same regularity conditions, we show that the convergence to a local minimiser is guaranteed using an alternating minimization algorithm that has closed form expressions for each of the minimization steps, which are convex. Numerical simulations under a variety of scenarios in terms of SNR, and condition number of the signal amplitude matrix demonstrate that the proposed algorithm consistently outperforms M-SBL and other state-of-the art algorithms.
Primary decomposition of zero-dimensional ideals over finite fields

NASA Astrophysics Data System (ADS)

Gao, Shuhong; Wan, Daqing; Wang, Mingsheng

2009-03-01

A new algorithm is presented for computing primary decomposition of zero-dimensional ideals over finite fields. Like Berlekamp's algorithm for univariate polynomials, the new method is based on the invariant subspace of the Frobenius map acting on the quotient algebra. The dimension of the invariant subspace equals the number of primary components, and a basis of the invariant subspace yields a complete decomposition. Unlike previous approaches for decomposing multivariate polynomial systems, the new method does not need primality testing nor any generic projection, instead it reduces the general decomposition problem directly to root finding of univariate polynomials over the ground field. Also, it is shown how Groebner basis structure can be used to get partial primary decomposition without any root finding.
Krylov subspace methods - Theory, algorithms, and applications

NASA Technical Reports Server (NTRS)

Sad, Youcef

1990-01-01

Projection methods based on Krylov subspaces for solving various types of scientific problems are reviewed. The main idea of this class of methods when applied to a linear system Ax = b, is to generate in some manner an approximate solution to the original problem from the so-called Krylov subspace span. Thus, the original problem of size N is approximated by one of dimension m, typically much smaller than N. Krylov subspace methods have been very successful in solving linear systems and eigenvalue problems and are now becoming popular for solving nonlinear equations. The main ideas in Krylov subspace methods are shown and their use in solving linear systems, eigenvalue problems, parabolic partial differential equations, Liapunov matrix equations, and nonlinear system of equations are discussed.
Computing interior eigenvalues of nonsymmetric matrices: application to three-dimensional metamaterial composites.

PubMed

Terao, Takamichi

2010-08-01

We propose a numerical method to calculate interior eigenvalues and corresponding eigenvectors for nonsymmetric matrices. Based on the subspace projection technique onto expanded Ritz subspace, it becomes possible to obtain eigenvalues and eigenvectors with sufficiently high precision. This method overcomes the difficulties of the traditional nonsymmetric Lanczos algorithm, and improves the accuracy of the obtained interior eigenvalues and eigenvectors. Using this algorithm, we investigate three-dimensional metamaterial composites consisting of positive and negative refractive index materials, and it is demonstrated that the finite-difference frequency-domain algorithm is applicable to analyze these metamaterial composites.
Single and multiple object tracking using log-euclidean Riemannian subspace and block-division appearance model.

PubMed

Hu, Weiming; Li, Xi; Luo, Wenhan; Zhang, Xiaoqin; Maybank, Stephen; Zhang, Zhongfei

2012-12-01

Object appearance modeling is crucial for tracking objects, especially in videos captured by nonstationary cameras and for reasoning about occlusions between multiple moving objects. Based on the log-euclidean Riemannian metric on symmetric positive definite matrices, we propose an incremental log-euclidean Riemannian subspace learning algorithm in which covariance matrices of image features are mapped into a vector space with the log-euclidean Riemannian metric. Based on the subspace learning algorithm, we develop a log-euclidean block-division appearance model which captures both the global and local spatial layout information about object appearances. Single object tracking and multi-object tracking with occlusion reasoning are then achieved by particle filtering-based Bayesian state inference. During tracking, incremental updating of the log-euclidean block-division appearance model captures changes in object appearance. For multi-object tracking, the appearance models of the objects can be updated even in the presence of occlusions. Experimental results demonstrate that the proposed tracking algorithm obtains more accurate results than six state-of-the-art tracking algorithms.
Modulated Hebb-Oja learning rule--a method for principal subspace analysis.

PubMed

Jankovic, Marko V; Ogawa, Hidemitsu

2006-03-01

This paper presents analysis of the recently proposed modulated Hebb-Oja (MHO) method that performs linear mapping to a lower-dimensional subspace. Principal component subspace is the method that will be analyzed. Comparing to some other well-known methods for yielding principal component subspace (e.g., Oja's Subspace Learning Algorithm), the proposed method has one feature that could be seen as desirable from the biological point of view--synaptic efficacy learning rule does not need the explicit information about the value of the other efficacies to make individual efficacy modification. Also, the simplicity of the "neural circuits" that perform global computations and a fact that their number does not depend on the number of input and output neurons, could be seen as good features of the proposed method.
Integrated feature extraction and selection for neuroimage classification

NASA Astrophysics Data System (ADS)

Fan, Yong; Shen, Dinggang

2009-02-01

Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
Subspace-based optimization method for inverse scattering problems with an inhomogeneous background medium

NASA Astrophysics Data System (ADS)

Chen, Xudong

2010-07-01

This paper proposes a version of the subspace-based optimization method to solve the inverse scattering problem with an inhomogeneous background medium where the known inhomogeneities are bounded in a finite domain. Although the background Green's function at each discrete point in the computational domain is not directly available in an inhomogeneous background scenario, the paper uses the finite element method to simultaneously obtain the Green's function at all discrete points. The essence of the subspace-based optimization method is that part of the contrast source is determined from the spectrum analysis without using any optimization, whereas the orthogonally complementary part is determined by solving a lower dimension optimization problem. This feature significantly speeds up the convergence of the algorithm and at the same time makes it robust against noise. Numerical simulations illustrate the efficacy of the proposed algorithm. The algorithm presented in this paper finds wide applications in nondestructive evaluation, such as through-wall imaging.
Application of Earthquake Subspace Detectors at Kilauea and Mauna Loa Volcanoes, Hawai`i

NASA Astrophysics Data System (ADS)

Okubo, P.; Benz, H.; Yeck, W.

2016-12-01

Recent studies have demonstrated the capabilities of earthquake subspace detectors for detailed cataloging and tracking of seismicity in a number of regions and settings. We are exploring the application of subspace detectors at the United States Geological Survey's Hawaiian Volcano Observatory (HVO) to analyze seismicity at Kilauea and Mauna Loa volcanoes. Elevated levels of microseismicity and occasional swarms of earthquakes associated with active volcanism here present cataloging challenges due the sheer numbers of earthquakes and an intrinsically low signal-to-noise environment featuring oceanic microseism and volcanic tremor in the ambient seismic background. With high-quality continuous recording of seismic data at HVO, we apply subspace detectors (Harris and Dodge, 2011, Bull. Seismol. Soc. Am., doi: 10.1785/0120100103) during intervals of noteworthy seismicity. Waveform templates are drawn from Magnitude 2 and larger earthquakes within clusters of earthquakes cataloged in the HVO seismic database. At Kilauea, we focus on seismic swarms in the summit caldera region where, despite continuing eruptions from vents in the summit region and in the east rift zone, geodetic measurements reflect a relatively inflated volcanic state. We also focus on seismicity beneath and adjacent to Mauna Loa's summit caldera that appears to be associated with geodetic expressions of gradual volcanic inflation, and where precursory seismicity clustered prior to both Mauna Loa's most recent eruptions in 1975 and 1984. We recover several times more earthquakes with the subspace detectors - down to roughly 2 magnitude units below the templates, based on relative amplitudes - compared to the numbers of cataloged earthquakes. The increased numbers of detected earthquakes in these clusters, and the ability to associate and locate them, allow us to infer details of the spatial and temporal distributions and possible variations in stresses within these key regions of the volcanoes.
Riemannian multi-manifold modeling and clustering in brain networks

NASA Astrophysics Data System (ADS)

Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.

2017-08-01

This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
Learning Robust and Discriminative Subspace With Low-Rank Constraints.

PubMed

Li, Sheng; Fu, Yun

2016-11-01

In this paper, we aim at learning robust and discriminative subspaces from noisy data. Subspace learning is widely used in extracting discriminative features for classification. However, when data are contaminated with severe noise, the performance of most existing subspace learning methods would be limited. Recent advances in low-rank modeling provide effective solutions for removing noise or outliers contained in sample sets, which motivates us to take advantage of low-rank constraints in order to exploit robust and discriminative subspace for classification. In particular, we present a discriminative subspace learning method called the supervised regularization-based robust subspace (SRRS) approach, by incorporating the low-rank constraint. SRRS seeks low-rank representations from the noisy data, and learns a discriminative subspace from the recovered clean data jointly. A supervised regularization function is designed to make use of the class label information, and therefore to enhance the discriminability of subspace. Our approach is formulated as a constrained rank-minimization problem. We design an inexact augmented Lagrange multiplier optimization algorithm to solve it. Unlike the existing sparse representation and low-rank learning methods, our approach learns a low-dimensional subspace from recovered data, and explicitly incorporates the supervised information. Our approach and some baselines are evaluated on the COIL-100, ALOI, Extended YaleB, FERET, AR, and KinFace databases. The experimental results demonstrate the effectiveness of our approach, especially when the data contain considerable noise or variations.
Hierarchical Discriminant Analysis.

PubMed

Lu, Di; Ding, Chuntao; Xu, Jinliang; Wang, Shangguang

2018-01-18

The Internet of Things (IoT) generates lots of high-dimensional sensor intelligent data. The processing of high-dimensional data (e.g., data visualization and data classification) is very difficult, so it requires excellent subspace learning algorithms to learn a latent subspace to preserve the intrinsic structure of the high-dimensional data, and abandon the least useful information in the subsequent processing. In this context, many subspace learning algorithms have been presented. However, in the process of transforming the high-dimensional data into the low-dimensional space, the huge difference between the sum of inter-class distance and the sum of intra-class distance for distinct data may cause a bias problem. That means that the impact of intra-class distance is overwhelmed. To address this problem, we propose a novel algorithm called Hierarchical Discriminant Analysis (HDA). It minimizes the sum of intra-class distance first, and then maximizes the sum of inter-class distance. This proposed method balances the bias from the inter-class and that from the intra-class to achieve better performance. Extensive experiments are conducted on several benchmark face datasets. The results reveal that HDA obtains better performance than other dimensionality reduction algorithms.
Beyond union of subspaces: Subspace pursuit on Grassmann manifold for data representation

DOE PAGES

Shen, Xinyue; Krim, Hamid; Gu, Yuantao

2016-03-01

Discovering the underlying structure of a high-dimensional signal or big data has always been a challenging topic, and has become harder to tackle especially when the observations are exposed to arbitrary sparse perturbations. Here in this paper, built on the model of a union of subspaces (UoS) with sparse outliers and inspired by a basis pursuit strategy, we exploit the fundamental structure of a Grassmann manifold, and propose a new technique of pursuing the subspaces systematically by solving a non-convex optimization problem using the alternating direction method of multipliers. This problem as noted is further complicated by non-convex constraints onmore » the Grassmann manifold, as well as the bilinearity in the penalty caused by the subspace bases and coefficients. Nevertheless, numerical experiments verify that the proposed algorithm, which provides elegant solutions to the sub-problems in each step, is able to de-couple the subspaces and pursue each of them under time-efficient parallel computation.« less
N-Screen Aware Multicriteria Hybrid Recommender System Using Weight Based Subspace Clustering

PubMed Central

Ullah, Farman; Lee, Sungchang

2014-01-01

This paper presents a recommender system for N-screen services in which users have multiple devices with different capabilities. In N-screen services, a user can use various devices in different locations and time and can change a device while the service is running. N-screen aware recommendation seeks to improve the user experience with recommended content by considering the user N-screen device attributes such as screen resolution, media codec, remaining battery time, and access network and the user temporal usage pattern information that are not considered in existing recommender systems. For N-screen aware recommendation support, this work introduces a user device profile collaboration agent, manager, and N-screen control server to acquire and manage the user N-screen devices profile. Furthermore, a multicriteria hybrid framework is suggested that incorporates the N-screen devices information with user preferences and demographics. In addition, we propose an individual feature and subspace weight based clustering (IFSWC) to assign different weights to each subspace and each feature within a subspace in the hybrid framework. The proposed system improves the accuracy, precision, scalability, sparsity, and cold start issues. The simulation results demonstrate the effectiveness and prove the aforementioned statements. PMID:25152921
Incorporating signal-dependent noise for hyperspectral target detection

NASA Astrophysics Data System (ADS)

Morman, Christopher J.; Meola, Joseph

2015-05-01

The majority of hyperspectral target detection algorithms are developed from statistical data models employing stationary background statistics or white Gaussian noise models. Stationary background models are inaccurate as a result of two separate physical processes. First, varying background classes often exist in the imagery that possess different clutter statistics. Many algorithms can account for this variability through the use of subspaces or clustering techniques. The second physical process, which is often ignored, is a signal-dependent sensor noise term. For photon counting sensors that are often used in hyperspectral imaging systems, sensor noise increases as the measured signal level increases as a result of Poisson random processes. This work investigates the impact of this sensor noise on target detection performance. A linear noise model is developed describing sensor noise variance as a linear function of signal level. The linear noise model is then incorporated for detection of targets using data collected at Wright Patterson Air Force Base.

Subspace-based analysis of the ERT inverse problem

NASA Astrophysics Data System (ADS)

Ben Hadj Miled, Mohamed Khames; Miller, Eric L.

2004-05-01

In a previous work, we proposed a source-type formulation to the electrical resistance tomography (ERT) problem. Specifically, we showed that inhomogeneities in the medium can be viewed as secondary sources embedded in the homogeneous background medium and located at positions associated with variation in electrical conductivity. Assuming a piecewise constant conductivity distribution, the support of equivalent sources is equal to the boundary of the inhomogeneity. The estimation of the anomaly shape takes the form of an inverse source-type problem. In this paper, we explore the use of subspace methods to localize the secondary equivalent sources associated with discontinuities in the conductivity distribution. Our first alternative is the multiple signal classification (MUSIC) algorithm which is commonly used in the localization of multiple sources. The idea is to project a finite collection of plausible pole (or dipole) sources onto an estimated signal subspace and select those with largest correlations. In ERT, secondary sources are excited simultaneously but in different ways, i.e. with distinct amplitude patterns, depending on the locations and amplitudes of primary sources. If the number of receivers is "large enough", different source configurations can lead to a set of observation vectors that span the data subspace. However, since sources that are spatially close to each other have highly correlated signatures, seperation of such signals becomes very difficult in the presence of noise. To overcome this problem we consider iterative MUSIC algorithms like R-MUSIC and RAP-MUSIC. These recursive algorithms pose a computational burden as they require multiple large combinatorial searches. Results obtained with these algorithms using simulated data of different conductivity patterns are presented.
Active subspace: toward scalable low-rank learning.

PubMed

Liu, Guangcan; Yan, Shuicheng

2012-12-01

We address the scalability issues in low-rank matrix learning problems. Usually these problems resort to solving nuclear norm regularized optimization problems (NNROPs), which often suffer from high computational complexities if based on existing solvers, especially in large-scale settings. Based on the fact that the optimal solution matrix to an NNROP is often low rank, we revisit the classic mechanism of low-rank matrix factorization, based on which we present an active subspace algorithm for efficiently solving NNROPs by transforming large-scale NNROPs into small-scale problems. The transformation is achieved by factorizing the large solution matrix into the product of a small orthonormal matrix (active subspace) and another small matrix. Although such a transformation generally leads to nonconvex problems, we show that a suboptimal solution can be found by the augmented Lagrange alternating direction method. For the robust PCA (RPCA) (Candès, Li, Ma, & Wright, 2009 ) problem, a typical example of NNROPs, theoretical results verify the suboptimality of the solution produced by our algorithm. For the general NNROPs, we empirically show that our algorithm significantly reduces the computational complexity without loss of optimality.
A new modulated Hebbian learning rule--biologically plausible method for local computation of a principal subspace.

PubMed

Jankovic, Marko; Ogawa, Hidemitsu

2003-08-01

This paper presents one possible implementation of a transformation that performs linear mapping to a lower-dimensional subspace. Principal component subspace will be the one that will be analyzed. Idea implemented in this paper represents generalization of the recently proposed infinity OH neural method for principal component extraction. The calculations in the newly proposed method are performed locally--a feature which is usually considered as desirable from the biological point of view. Comparing to some other wellknown methods, proposed synaptic efficacy learning rule requires less information about the value of the other efficacies to make single efficacy modification. Synaptic efficacies are modified by implementation of Modulated Hebb-type (MH) learning rule. Slightly modified MH algorithm named Modulated Hebb Oja (MHO) algorithm, will be also introduced. Structural similarity of the proposed network with part of the retinal circuit will be presented, too.
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].

PubMed

Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li

2015-12-01

Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
An alternative subspace approach to EEG dipole source localization

NASA Astrophysics Data System (ADS)

Xu, Xiao-Liang; Xu, Bobby; He, Bin

2004-01-01

In the present study, we investigate a new approach to electroencephalography (EEG) three-dimensional (3D) dipole source localization by using a non-recursive subspace algorithm called FINES. In estimating source dipole locations, the present approach employs projections onto a subspace spanned by a small set of particular vectors (FINES vector set) in the estimated noise-only subspace instead of the entire estimated noise-only subspace in the case of classic MUSIC. The subspace spanned by this vector set is, in the sense of principal angle, closest to the subspace spanned by the array manifold associated with a particular brain region. By incorporating knowledge of the array manifold in identifying FINES vector sets in the estimated noise-only subspace for different brain regions, the present approach is able to estimate sources with enhanced accuracy and spatial resolution, thus enhancing the capability of resolving closely spaced sources and reducing estimation errors. The present computer simulations show, in EEG 3D dipole source localization, that compared to classic MUSIC, FINES has (1) better resolvability of two closely spaced dipolar sources and (2) better estimation accuracy of source locations. In comparison with RAP-MUSIC, FINES' performance is also better for the cases studied when the noise level is high and/or correlations among dipole sources exist.
Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation for Data Mining.

PubMed

Cheng, Wenlong; Zhao, Mingbo; Xiong, Naixue; Chui, Kwok Tai

2017-07-15

Parsimony, including sparsity and low-rank, has shown great importance for data mining in social networks, particularly in tasks such as segmentation and recognition. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function with convex l ₁-norm or nuclear norm constraints. However, the obtained results by convex optimization are usually suboptimal to solutions of original sparse or low-rank problems. In this paper, a novel robust subspace segmentation algorithm has been proposed by integrating l p -norm and Schatten p -norm constraints. Our so-obtained affinity graph can better capture local geometrical structure and the global information of the data. As a consequence, our algorithm is more generative, discriminative and robust. An efficient linearized alternating direction method is derived to realize our model. Extensive segmentation experiments are conducted on public datasets. The proposed algorithm is revealed to be more effective and robust compared to five existing algorithms.
Cluster ensemble based on Random Forests for genetic data.

PubMed

Alhusain, Luluah; Hafez, Alaaeldin M

2017-01-01

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.
A multimembership catalogue for 1876 open clusters using UCAC4 data

NASA Astrophysics Data System (ADS)

Sampedro, L.; Dias, W. S.; Alfaro, E. J.; Monteiro, H.; Molino, A.

2017-10-01

The main objective of this work is to determine the cluster members of 1876 open clusters, using positions and proper motions of the astrometric fourth United States Naval Observatory (USNO) CCD Astrograph Catalog (UCAC4). For this purpose, we apply three different methods, all based on a Bayesian approach, but with different formulations: a purely parametric method, another completely non-parametric algorithm and a third, recently developed by Sampedro & Alfaro, using both formulations at different steps of the whole process. The first and second statistical moments of the members' phase-space subspace, obtained after applying the three methods, are compared for every cluster. Although, on average, the three methods yield similar results, there are also specific differences between them, as well as for some particular clusters. The comparison with other published catalogues shows good agreement. We have also estimated, for the first time, the mean proper motion for a sample of 18 clusters. The results are organized in a single catalogue formed by two main files, one with the most relevant information for each cluster, partially including that in UCAC4, and the other showing the individual membership probabilities for each star in the cluster area. The final catalogue, with an interface design that enables an easy interaction with the user, is available in electronic format at the Stellar Systems Group (SSG-IAA) web site (http://ssg.iaa.es/en/content/sampedro-cluster-catalog).
Visual exploration of high-dimensional data through subspace analysis and dynamic projections

DOE PAGES

Liu, S.; Wang, B.; Thiagarajan, J. J.; ...

2015-06-01

Here, we introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that createmore » smooth animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
Visual Exploration of High-Dimensional Data through Subspace Analysis and Dynamic Projections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, S.; Wang, B.; Thiagarajan, Jayaraman J.

2015-06-01

We introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that create smoothmore » animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
SubspaceEM: A Fast Maximum-a-posteriori Algorithm for Cryo-EM Single Particle Reconstruction

PubMed Central

Dvornek, Nicha C.; Sigworth, Fred J.; Tagare, Hemant D.

2015-01-01

Single particle reconstruction methods based on the maximum-likelihood principle and the expectation-maximization (E–M) algorithm are popular because of their ability to produce high resolution structures. However, these algorithms are computationally very expensive, requiring a network of computational servers. To overcome this computational bottleneck, we propose a new mathematical framework for accelerating maximum-likelihood reconstructions. The speedup is by orders of magnitude and the proposed algorithm produces similar quality reconstructions compared to the standard maximum-likelihood formulation. Our approach uses subspace approximations of the cryo-electron microscopy (cryo-EM) data and projection images, greatly reducing the number of image transformations and comparisons that are computed. Experiments using simulated and actual cryo-EM data show that speedup in overall execution time compared to traditional maximum-likelihood reconstruction reaches factors of over 300. PMID:25839831
Simultaneous multigrid techniques for nonlinear eigenvalue problems: Solutions of the nonlinear Schrödinger-Poisson eigenvalue problem in two and three dimensions

NASA Astrophysics Data System (ADS)

Costiner, Sorin; Ta'asan, Shlomo

1995-07-01

Algorithms for nonlinear eigenvalue problems (EP's) often require solving self-consistently a large number of EP's. Convergence difficulties may occur if the solution is not sought in an appropriate region, if global constraints have to be satisfied, or if close or equal eigenvalues are present. Multigrid (MG) algorithms for nonlinear problems and for EP's obtained from discretizations of partial differential EP have often been shown to be more efficient than single level algorithms. This paper presents MG techniques and a MG algorithm for nonlinear Schrödinger Poisson EP's. The algorithm overcomes the above mentioned difficulties combining the following techniques: a MG simultaneous treatment of the eigenvectors and nonlinearity, and with the global constrains; MG stable subspace continuation techniques for the treatment of nonlinearity; and a MG projection coupled with backrotations for separation of solutions. These techniques keep the solutions in an appropriate region, where the algorithm converges fast, and reduce the large number of self-consistent iterations to only a few or one MG simultaneous iteration. The MG projection makes it possible to efficiently overcome difficulties related to clusters of close and equal eigenvalues. Computational examples for the nonlinear Schrödinger-Poisson EP in two and three dimensions, presenting special computational difficulties that are due to the nonlinearity and to the equal and closely clustered eigenvalues are demonstrated. For these cases, the algorithm requires O(qN) operations for the calculation of q eigenvectors of size N and for the corresponding eigenvalues. One MG simultaneous cycle per fine level was performed. The total computational cost is equivalent to only a few Gauss-Seidel relaxations per eigenvector. An asymptotic convergence rate of 0.15 per MG cycle is attained.
A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix [A projected preconditioned conjugate gradient algorithm for computing a large eigenspace of a Hermitian matrix

DOE PAGES

Vecharynski, Eugene; Yang, Chao; Pask, John E.

2015-02-25

Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
A strategy for analysis of (molecular) equilibrium simulations: Configuration space density estimation, clustering, and visualization

NASA Astrophysics Data System (ADS)

Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.

2001-02-01

We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.
Koopman Invariant Subspaces and Finite Linear Representations of Nonlinear Dynamical Systems for Control.

PubMed

Brunton, Steven L; Brunton, Bingni W; Proctor, Joshua L; Kutz, J Nathan

2016-01-01

In this wIn this work, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.ork, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control.
Towards automatic music transcription: note extraction based on independent subspace analysis

NASA Astrophysics Data System (ADS)

Wellhausen, Jens; Hoynck, Michael

2005-01-01

Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.
Towards automatic music transcription: note extraction based on independent subspace analysis

NASA Astrophysics Data System (ADS)

Wellhausen, Jens; Höynck, Michael

2004-12-01

Due to the increasing amount of music available electronically the need of automatic search, retrieval and classification systems for music becomes more and more important. In this paper an algorithm for automatic transcription of polyphonic piano music into MIDI data is presented, which is a very interesting basis for database applications, music analysis and music classification. The first part of the algorithm performs a note accurate temporal audio segmentation. In the second part, the resulting segments are examined using Independent Subspace Analysis to extract sounding notes. Finally, the results are used to build a MIDI file as a new representation of the piece of music which is examined.
Subspace aware recovery of low rank and jointly sparse signals

PubMed Central

Biswas, Sampurna; Dasgupta, Soura; Mudumbai, Raghuraman; Jacob, Mathews

2017-01-01

We consider the recovery of a matrix X, which is simultaneously low rank and joint sparse, from few measurements of its columns using a two-step algorithm. Each column of X is measured using a combination of two measurement matrices; one which is the same for every column, while the the second measurement matrix varies from column to column. The recovery proceeds by first estimating the row subspace vectors from the measurements corresponding to the common matrix. The estimated row subspace vectors are then used to recover X from all the measurements using a convex program of joint sparsity minimization. Our main contribution is to provide sufficient conditions on the measurement matrices that guarantee the recovery of such a matrix using the above two-step algorithm. The results demonstrate quite significant savings in number of measurements when compared to the standard multiple measurement vector (MMV) scheme, which assumes same time invariant measurement pattern for all the time frames. We illustrate the impact of the sampling pattern on reconstruction quality using breath held cardiac cine MRI and cardiac perfusion MRI data, while the utility of the algorithm to accelerate the acquisition is demonstrated on MR parameter mapping. PMID:28630889
pyCTQW: A continuous-time quantum walk simulator on distributed memory computers

NASA Astrophysics Data System (ADS)

Izaac, Josh A.; Wang, Jingbo B.

2015-01-01

In the general field of quantum information and computation, quantum walks are playing an increasingly important role in constructing physical models and quantum algorithms. We have recently developed a distributed memory software package pyCTQW, with an object-oriented Python interface, that allows efficient simulation of large multi-particle CTQW (continuous-time quantum walk)-based systems. In this paper, we present an introduction to the Python and Fortran interfaces of pyCTQW, discuss various numerical methods of calculating the matrix exponential, and demonstrate the performance behavior of pyCTQW on a distributed memory cluster. In particular, the Chebyshev and Krylov-subspace methods for calculating the quantum walk propagation are provided, as well as methods for visualization and data analysis.
Pigments identification of paintings using subspace distance unmixing algorithm

NASA Astrophysics Data System (ADS)

Li, Bin; Lyu, Shuqiang; Zhang, Dafeng; Dong, Qinghao

2018-04-01

In the digital protection of the cultural relics, the identification of the pigment mixtures on the surface of the painting has been the research spot for many years. In this paper, as a hyperspectral unmixing algorithm, sub-space distance unmixing is introduced to solve the problem of recognition of pigments mixture in paintings. Firstly, some mixtures of different pigments are designed to measure their reflectance spectra using spectrometer. Moreover, the factors affecting the unmixing accuracy of pigments' mixtures are discussed. The unmixing results of two cases with and without rice paper and its underlay as endmembers are compared. The experiment results show that the algorithm is able to unmixing the pigments effectively and the unmixing accuracy can be improved after considering the influence of spectra of the rich paper and the underlaying material.

On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

ERIC Educational Resources Information Center

Carter, Randy L.; And Others

1989-01-01

The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Current harmonics elimination control method for six-phase PM synchronous motor drives.

PubMed

Yuan, Lei; Chen, Ming-liang; Shen, Jian-qing; Xiao, Fei

2015-11-01

To reduce the undesired 5th and 7th stator harmonic current in the six-phase permanent magnet synchronous motor (PMSM), an improved vector control algorithm was proposed based on vector space decomposition (VSD) transformation method, which can control the fundamental and harmonic subspace separately. To improve the traditional VSD technology, a novel synchronous rotating coordinate transformation matrix was presented in this paper, and only using the traditional PI controller in d-q subspace can meet the non-static difference adjustment, the controller parameter design method is given by employing internal model principle. Moreover, the current PI controller parallel with resonant controller is employed in x-y subspace to realize the specific 5th and 7th harmonic component compensation. In addition, a new six-phase SVPWM algorithm based on VSD transformation theory is also proposed. Simulation and experimental results verify the effectiveness of current decoupling vector controller. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

PubMed

Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

2015-10-01

Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Detecting coupled collective motions in protein by independent subspace analysis

NASA Astrophysics Data System (ADS)

Sakuraba, Shun; Joti, Yasumasa; Kitao, Akio

2010-11-01

Protein dynamics evolves in a high-dimensional space, comprising aharmonic, strongly correlated motional modes. Such correlation often plays an important role in analyzing protein function. In order to identify significantly correlated collective motions, here we employ independent subspace analysis based on the subspace joint approximate diagonalization of eigenmatrices algorithm for the analysis of molecular dynamics (MD) simulation trajectories. From the 100 ns MD simulation of T4 lysozyme, we extract several independent subspaces in each of which collective modes are significantly correlated, and identify the other modes as independent. This method successfully detects the modes along which long-tailed non-Gaussian probability distributions are obtained. Based on the time cross-correlation analysis, we identified a series of events among domain motions and more localized motions in the protein, indicating the connection between the functionally relevant phenomena which have been independently revealed by experiments.
Subspace projection method for unstructured searches with noisy quantum oracles using a signal-based quantum emulation device

NASA Astrophysics Data System (ADS)

La Cour, Brian R.; Ostrove, Corey I.

2017-01-01

This paper describes a novel approach to solving unstructured search problems using a classical, signal-based emulation of a quantum computer. The classical nature of the representation allows one to perform subspace projections in addition to the usual unitary gate operations. Although bandwidth requirements will limit the scale of problems that can be solved by this method, it can nevertheless provide a significant computational advantage for problems of limited size. In particular, we find that, for the same number of noisy oracle calls, the proposed subspace projection method provides a higher probability of success for finding a solution than does an single application of Grover's algorithm on the same device.
BI-sparsity pursuit for robust subspace recovery

DOE PAGES

Bian, Xiao; Krim, Hamid

2015-09-01

Here, the success of sparse models in computer vision and machine learning in many real-world applications, may be attributed in large part, to the fact that many high dimensional data are distributed in a union of low dimensional subspaces. The underlying structure may, however, be adversely affected by sparse errors, thus inducing additional complexity in recovering it. In this paper, we propose a bi-sparse model as a framework to investigate and analyze this problem, and provide as a result , a novel algorithm to recover the union of subspaces in presence of sparse corruptions. We additionally demonstrate the effectiveness ofmore » our method by experiments on real-world vision data.« less
A comparative intelligibility study of single-microphone noise reduction algorithms.

PubMed

Hu, Yi; Loizou, Philipos C

2007-09-01

The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Collaboration space division in collaborative product development based on a genetic algorithm

NASA Astrophysics Data System (ADS)

Qian, Xueming; Ma, Yanqiao; Feng, Huan

2018-02-01

The advance in the global environment, rapidly changing markets, and information technology has created a new stage for design. In such an environment, one strategy for success is the Collaborative Product Development (CPD). Organizing people effectively is the goal of Collaborative Product Development, and it solves the problem with certain foreseeability. The development group activities are influenced not only by the methods and decisions available, but also by correlation among personnel. Grouping the personnel according to their correlation intensity is defined as collaboration space division (CSD). Upon establishment of a correlation matrix (CM) of personnel and an analysis of the collaboration space, the genetic algorithm (GA) and minimum description length (MDL) principle may be used as tools in optimizing collaboration space. The MDL principle is used in setting up an object function, and the GA is used as a methodology. The algorithm encodes spatial information as a chromosome in binary. After repetitious crossover, mutation, selection and multiplication, a robust chromosome is found, which can be decoded into an optimal collaboration space. This new method can calculate the members in sub-spaces and individual groupings within the staff. Furthermore, the intersection of sub-spaces and public persons belonging to all sub-spaces can be determined simultaneously.
Skin subspace color modeling for daytime and nighttime group activity recognition in confined operational spaces

NASA Astrophysics Data System (ADS)

Shirkhodaie, Amir; Poshtyar, Azin; Chan, Alex; Hu, Shuowen

2016-05-01

In many military and homeland security persistent surveillance applications, accurate detection of different skin colors in varying observability and illumination conditions is a valuable capability for video analytics. One of those applications is In-Vehicle Group Activity (IVGA) recognition, in which significant changes in observability and illumination may occur during the course of a specific human group activity of interest. Most of the existing skin color detection algorithms, however, are unable to perform satisfactorily in confined operational spaces with partial observability and occultation, as well as under diverse and changing levels of illumination intensity, reflection, and diffraction. In this paper, we investigate the salient features of ten popular color spaces for skin subspace color modeling. More specifically, we examine the advantages and disadvantages of each of these color spaces, as well as the stability and suitability of their features in differentiating skin colors under various illumination conditions. The salient features of different color subspaces are methodically discussed and graphically presented. Furthermore, we present robust and adaptive algorithms for skin color detection based on this analysis. Through examples, we demonstrate the efficiency and effectiveness of these new color skin detection algorithms and discuss their applicability for skin detection in IVGA recognition applications.
A Robust Sound Source Localization Approach for Microphone Array with Model Errors

NASA Astrophysics Data System (ADS)

Xiao, Hua; Shao, Huai-Zong; Peng, Qi-Cong

In this paper, a robust sound source localization approach is proposed. The approach retains good performance even when model errors exist. Compared with previous work in this field, the contributions of this paper are as follows. First, an improved broad-band and near-field array model is proposed. It takes array gain, phase perturbations into account and is based on the actual positions of the elements. It can be used in arbitrary planar geometry arrays. Second, a subspace model errors estimation algorithm and a Weighted 2-Dimension Multiple Signal Classification (W2D-MUSIC) algorithm are proposed. The subspace model errors estimation algorithm estimates unknown parameters of the array model, i. e., gain, phase perturbations, and positions of the elements, with high accuracy. The performance of this algorithm is improved with the increasing of SNR or number of snapshots. The W2D-MUSIC algorithm based on the improved array model is implemented to locate sound sources. These two algorithms compose the robust sound source approach. The more accurate steering vectors can be provided for further processing such as adaptive beamforming algorithm. Numerical examples confirm effectiveness of this proposed approach.
Koopman Invariant Subspaces and Finite Linear Representations of Nonlinear Dynamical Systems for Control

PubMed Central

Brunton, Steven L.; Brunton, Bingni W.; Proctor, Joshua L.; Kutz, J. Nathan

2016-01-01

In this work, we explore finite-dimensional linear representations of nonlinear dynamical systems by restricting the Koopman operator to an invariant subspace spanned by specially chosen observable functions. The Koopman operator is an infinite-dimensional linear operator that evolves functions of the state of a dynamical system. Dominant terms in the Koopman expansion are typically computed using dynamic mode decomposition (DMD). DMD uses linear measurements of the state variables, and it has recently been shown that this may be too restrictive for nonlinear systems. Choosing the right nonlinear observable functions to form an invariant subspace where it is possible to obtain linear reduced-order models, especially those that are useful for control, is an open challenge. Here, we investigate the choice of observable functions for Koopman analysis that enable the use of optimal linear control techniques on nonlinear problems. First, to include a cost on the state of the system, as in linear quadratic regulator (LQR) control, it is helpful to include these states in the observable subspace, as in DMD. However, we find that this is only possible when there is a single isolated fixed point, as systems with multiple fixed points or more complicated attractors are not globally topologically conjugate to a finite-dimensional linear system, and cannot be represented by a finite-dimensional linear Koopman subspace that includes the state. We then present a data-driven strategy to identify relevant observable functions for Koopman analysis by leveraging a new algorithm to determine relevant terms in a dynamical system by ℓ1-regularized regression of the data in a nonlinear function space; we also show how this algorithm is related to DMD. Finally, we demonstrate the usefulness of nonlinear observable subspaces in the design of Koopman operator optimal control laws for fully nonlinear systems using techniques from linear optimal control. PMID:26919740
Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties.

PubMed

Sila, Andrew M; Shepherd, Keith D; Pokhariyal, Ganesh P

2016-04-15

We propose four methods for finding local subspaces in large spectral libraries. The proposed four methods include (a) cosine angle spectral matching; (b) hit quality index spectral matching; (c) self-organizing maps and (d) archetypal analysis methods. Then evaluate prediction accuracies for global and subspaces calibration models. These methods were tested on a mid-infrared spectral library containing 1907 soil samples collected from 19 different countries under the Africa Soil Information Service project. Calibration models for pH, Mehlich-3 Ca, Mehlich-3 Al, total carbon and clay soil properties were developed for the whole library and for the subspace. Root mean square error of prediction was used to evaluate predictive performance of subspace and global models. The root mean square error of prediction was computed using a one-third-holdout validation set. Effect of pretreating spectra with different methods was tested for 1st and 2nd derivative Savitzky-Golay algorithm, multiplicative scatter correction, standard normal variate and standard normal variate followed by detrending methods. In summary, the results show that global models outperformed the subspace models. We, therefore, conclude that global models are more accurate than the local models except in few cases. For instance, sand and clay root mean square error values from local models from archetypal analysis method were 50% poorer than the global models except for subspace models obtained using multiplicative scatter corrected spectra with which were 12% better. However, the subspace approach provides novel methods for discovering data pattern that may exist in large spectral libraries.
Non-Cooperative Target Recognition by Means of Singular Value Decomposition Applied to Radar High Resolution Range Profiles †

PubMed Central

López-Rodríguez, Patricia; Escot-Bocanegra, David; Fernández-Recio, Raúl; Bravo, Ignacio

2015-01-01

Radar high resolution range profiles are widely used among the target recognition community for the detection and identification of flying targets. In this paper, singular value decomposition is applied to extract the relevant information and to model each aircraft as a subspace. The identification algorithm is based on angle between subspaces and takes place in a transformed domain. In order to have a wide database of radar signatures and evaluate the performance, simulated range profiles are used as the recognition database while the test samples comprise data of actual range profiles collected in a measurement campaign. Thanks to the modeling of aircraft as subspaces only the valuable information of each target is used in the recognition process. Thus, one of the main advantages of using singular value decomposition, is that it helps to overcome the notable dissimilarities found in the shape and signal-to-noise ratio between actual and simulated profiles due to their difference in nature. Despite these differences, the recognition rates obtained with the algorithm are quite promising. PMID:25551484
A Tensor-Based Subspace Approach for Bistatic MIMO Radar in Spatial Colored Noise

PubMed Central

Wang, Xianpeng; Wang, Wei; Li, Xin; Wang, Junxiang

2014-01-01

In this paper, a new tensor-based subspace approach is proposed to estimate the direction of departure (DOD) and the direction of arrival (DOA) for bistatic multiple-input multiple-output (MIMO) radar in the presence of spatial colored noise. Firstly, the received signals can be packed into a third-order measurement tensor by exploiting the inherent structure of the matched filter. Then, the measurement tensor can be divided into two sub-tensors, and a cross-covariance tensor is formulated to eliminate the spatial colored noise. Finally, the signal subspace is constructed by utilizing the higher-order singular value decomposition (HOSVD) of the cross-covariance tensor, and the DOD and DOA can be obtained through the estimation of signal parameters via rotational invariance technique (ESPRIT) algorithm, which are paired automatically. Since the multidimensional inherent structure and the cross-covariance tensor technique are used, the proposed method provides better angle estimation performance than Chen's method, the ESPRIT algorithm and the multi-SVD method. Simulation results confirm the effectiveness and the advantage of the proposed method. PMID:24573313
A tensor-based subspace approach for bistatic MIMO radar in spatial colored noise.

PubMed

Wang, Xianpeng; Wang, Wei; Li, Xin; Wang, Junxiang

2014-02-25

In this paper, a new tensor-based subspace approach is proposed to estimate the direction of departure (DOD) and the direction of arrival (DOA) for bistatic multiple-input multiple-output (MIMO) radar in the presence of spatial colored noise. Firstly, the received signals can be packed into a third-order measurement tensor by exploiting the inherent structure of the matched filter. Then, the measurement tensor can be divided into two sub-tensors, and a cross-covariance tensor is formulated to eliminate the spatial colored noise. Finally, the signal subspace is constructed by utilizing the higher-order singular value decomposition (HOSVD) of the cross-covariance tensor, and the DOD and DOA can be obtained through the estimation of signal parameters via rotational invariance technique (ESPRIT) algorithm, which are paired automatically. Since the multidimensional inherent structure and the cross-covariance tensor technique are used, the proposed method provides better angle estimation performance than Chen's method, the ESPRIT algorithm and the multi-SVD method. Simulation results confirm the effectiveness and the advantage of the proposed method.
Inverse transport calculations in optical imaging with subspace optimization algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Tian, E-mail: tding@math.utexas.edu; Ren, Kui, E-mail: ren@math.utexas.edu

2014-09-15

Inverse boundary value problems for the radiative transport equation play an important role in optics-based medical imaging techniques such as diffuse optical tomography (DOT) and fluorescence optical tomography (FOT). Despite the rapid progress in the mathematical theory and numerical computation of these inverse problems in recent years, developing robust and efficient reconstruction algorithms remains a challenging task and an active research topic. We propose here a robust reconstruction method that is based on subspace minimization techniques. The method splits the unknown transport solution (or a functional of it) into low-frequency and high-frequency components, and uses singular value decomposition to analyticallymore » recover part of low-frequency information. Minimization is then applied to recover part of the high-frequency components of the unknowns. We present some numerical simulations with synthetic data to demonstrate the performance of the proposed algorithm.« less
Automated computation of autonomous spectral submanifolds for nonlinear modal analysis

NASA Astrophysics Data System (ADS)

Ponsioen, Sten; Pedergnana, Tiemo; Haller, George

2018-04-01

We discuss an automated computational methodology for computing two-dimensional spectral submanifolds (SSMs) in autonomous nonlinear mechanical systems of arbitrary degrees of freedom. In our algorithm, SSMs, the smoothest nonlinear continuations of modal subspaces of the linearized system, are constructed up to arbitrary orders of accuracy, using the parameterization method. An advantage of this approach is that the construction of the SSMs does not break down when the SSM folds over its underlying spectral subspace. A further advantage is an automated a posteriori error estimation feature that enables a systematic increase in the orders of the SSM computation until the required accuracy is reached. We find that the present algorithm provides a major speed-up, relative to numerical continuation methods, in the computation of backbone curves, especially in higher-dimensional problems. We illustrate the accuracy and speed of the automated SSM algorithm on lower- and higher-dimensional mechanical systems.
A multifaceted independent performance analysis of facial subspace recognition algorithms.

PubMed

Bajwa, Usama Ijaz; Taj, Imtiaz Ahmad; Anwar, Muhammad Waqas; Wang, Xuan

2013-01-01

Face recognition has emerged as the fastest growing biometric technology and has expanded a lot in the last few years. Many new algorithms and commercial systems have been proposed and developed. Most of them use Principal Component Analysis (PCA) as a base for their techniques. Different and even conflicting results have been reported by researchers comparing these algorithms. The purpose of this study is to have an independent comparative analysis considering both performance and computational complexity of six appearance based face recognition algorithms namely PCA, 2DPCA, A2DPCA, (2D)(2)PCA, LPP and 2DLPP under equal working conditions. This study was motivated due to the lack of unbiased comprehensive comparative analysis of some recent subspace methods with diverse distance metric combinations. For comparison with other studies, FERET, ORL and YALE databases have been used with evaluation criteria as of FERET evaluations which closely simulate real life scenarios. A comparison of results with previous studies is performed and anomalies are reported. An important contribution of this study is that it presents the suitable performance conditions for each of the algorithms under consideration.
An implementation of the QMR method based on coupled two-term recurrences

NASA Technical Reports Server (NTRS)

Freund, Roland W.; Nachtigal, Noeel M.

1992-01-01

The authors have proposed a new Krylov subspace iteration, the quasi-minimal residual algorithm (QMR), for solving non-Hermitian linear systems. In the original implementation of the QMR method, the Lanczos process with look-ahead is used to generate basis vectors for the underlying Krylov subspaces. In the Lanczos algorithm, these basis vectors are computed by means of three-term recurrences. It has been observed that, in finite precision arithmetic, vector iterations based on three-term recursions are usually less robust than mathematically equivalent coupled two-term vector recurrences. This paper presents a look-ahead algorithm that constructs the Lanczos basis vectors by means of coupled two-term recursions. Implementation details are given, and the look-ahead strategy is described. A new implementation of the QMR method, based on this coupled two-term algorithm, is described. A simplified version of the QMR algorithm without look-ahead is also presented, and the special case of QMR for complex symmetric linear systems is considered. Results of numerical experiments comparing the original and the new implementations of the QMR method are reported.
Performance Analysis of Blind Subspace-Based Signature Estimation Algorithms for DS-CDMA Systems with Unknown Correlated Noise

NASA Astrophysics Data System (ADS)

Zarifi, Keyvan; Gershman, Alex B.

2006-12-01

We analyze the performance of two popular blind subspace-based signature waveform estimation techniques proposed by Wang and Poor and Buzzi and Poor for direct-sequence code division multiple-access (DS-CDMA) systems with unknown correlated noise. Using the first-order perturbation theory, analytical expressions for the mean-square error (MSE) of these algorithms are derived. We also obtain simple high SNR approximations of the MSE expressions which explicitly clarify how the performance of these techniques depends on the environmental parameters and how it is related to that of the conventional techniques that are based on the standard white noise assumption. Numerical examples further verify the consistency of the obtained analytical results with simulation results.

Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the protein energy surface

PubMed Central

2012-01-01

Background Despite computational challenges, elucidating conformations that a protein system assumes under physiologic conditions for the purpose of biological activity is a central problem in computational structural biology. While these conformations are associated with low energies in the energy surface that underlies the protein conformational space, few existing conformational search algorithms focus on explicitly sampling low-energy local minima in the protein energy surface. Methods This work proposes a novel probabilistic search framework, PLOW, that explicitly samples low-energy local minima in the protein energy surface. The framework combines algorithmic ingredients from evolutionary computation and computational structural biology to effectively explore the subspace of local minima. A greedy local search maps a conformation sampled in conformational space to a nearby local minimum. A perturbation move jumps out of a local minimum to obtain a new starting conformation for the greedy local search. The process repeats in an iterative fashion, resulting in a trajectory-based exploration of the subspace of local minima. Results and conclusions The analysis of PLOW's performance shows that, by navigating only the subspace of local minima, PLOW is able to sample conformations near a protein's native structure, either more effectively or as well as state-of-the-art methods that focus on reproducing the native structure for a protein system. Analysis of the actual subspace of local minima shows that PLOW samples this subspace more effectively that a naive sampling approach. Additional theoretical analysis reveals that the perturbation function employed by PLOW is key to its ability to sample a diverse set of low-energy conformations. This analysis also suggests directions for further research and novel applications for the proposed framework. PMID:22759582
Machine-Learning Inspired Seismic Phase Detection for Aftershocks of the 2008 MW7.9 Wenchuan Earthquake

NASA Astrophysics Data System (ADS)

Zhu, L.; Li, Z.; Li, C.; Wang, B.; Chen, Z.; McClellan, J. H.; Peng, Z.

2017-12-01

Spatial-temporal evolution of aftershocks is important for illumination of earthquake physics and for rapid response of devastative earthquakes. To improve aftershock catalogs of the 2008 MW7.9 Wenchuan earthquake in Sichuan, China, Alibaba cloud and China Earthquake Administration jointly launched a seismological contest in May 2017 [Fang et al., 2017]. This abstract describes how we handle this problem in this competition. We first used Short-Term Average/Long-Term Average (STA/LTA) and Kurtosis function to obtain over 55000 candidate phase picks (P or S). Based on Signal to Noise Ratio (SNR), about 40000 phases (P or S) are selected. So far, these 40000 phases have a hit rate of 40% among the manually picks. The causes include that 1) there exist false picks (neither P nor S); 2) some P and S arrivals are mis-labeled. To improve our results, we correlate the 40000 phases over continuous waveforms to obtain the phases missed by during the first pass. This results in 120,000 events. After constructing an affinity matrix based on the cross-correlation for newly detected phases, subspace clustering methods [Vidal 2011] are applied to group those phases into separated subspaces. Initial results show good agreement between empirical and clustered labels of P phases. Half of the empirical S phases are clustered into the P phase cluster. This may be a combined effect of 1) mislabeling isolated P phases to S phases and 2) clustering errors due to a small incomplete sample pool. Phases that were falsely detected in the initial results can be also teased out. To better characterize P and S phases, our next step is to apply subspace clustering methods directly to the waveforms, instead of using the cross-correlation coefficients of detected phases. After that, supervised learning, e.g., a convolutional neural network, can be employed to improve the pick accuracy. Updated results will be presented at the meeting.
Tensor Rank Preserving Discriminant Analysis for Facial Recognition.

PubMed

Tao, Dapeng; Guo, Yanan; Li, Yaotang; Gao, Xinbo

2017-10-12

Facial recognition, one of the basic topics in computer vision and pattern recognition, has received substantial attention in recent years. However, for those traditional facial recognition algorithms, the facial images are reshaped to a long vector, thereby losing part of the original spatial constraints of each pixel. In this paper, a new tensor-based feature extraction algorithm termed tensor rank preserving discriminant analysis (TRPDA) for facial image recognition is proposed; the proposed method involves two stages: in the first stage, the low-dimensional tensor subspace of the original input tensor samples was obtained; in the second stage, discriminative locality alignment was utilized to obtain the ultimate vector feature representation for subsequent facial recognition. On the one hand, the proposed TRPDA algorithm fully utilizes the natural structure of the input samples, and it applies an optimization criterion that can directly handle the tensor spectral analysis problem, thereby decreasing the computation cost compared those traditional tensor-based feature selection algorithms. On the other hand, the proposed TRPDA algorithm extracts feature by finding a tensor subspace that preserves most of the rank order information of the intra-class input samples. Experiments on the three facial databases are performed here to determine the effectiveness of the proposed TRPDA algorithm.
Optimal Wavelengths Selection Using Hierarchical Evolutionary Algorithm for Prediction of Firmness and Soluble Solids Content in Apples

USDA-ARS?s Scientific Manuscript database

Hyperspectral scattering is a promising technique for rapid and noninvasive measurement of multiple quality attributes of apple fruit. A hierarchical evolutionary algorithm (HEA) approach, in combination with subspace decomposition and partial least squares (PLS) regression, was proposed to select o...
NASA Astrophysics Data System (ADS)

2017-10-01

Different possible selections of the weighting matrices W1 and W2 will lead to different methods of subspace identification. In this study, the robust combined algorithm proposed by Van Overschee and De Moor [24] has been employed. In this algorithm, the weighting matrixes are chosen to be W1 = I and W2 =ΠUf⊥.
Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

NASA Astrophysics Data System (ADS)

Li, Weixuan; Lin, Guang; Li, Bing

2016-09-01

Many uncertainty quantification (UQ) approaches suffer from the curse of dimensionality, that is, their computational costs become intractable for problems involving a large number of uncertainty parameters. In these situations, the classic Monte Carlo often remains the preferred method of choice because its convergence rate O (n - 1 / 2), where n is the required number of model simulations, does not depend on the dimension of the problem. However, many high-dimensional UQ problems are intrinsically low-dimensional, because the variation of the quantity of interest (QoI) is often caused by only a few latent parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace in the statistics literature. Motivated by this observation, we propose two inverse regression-based UQ algorithms (IRUQ) for high-dimensional problems. Both algorithms use inverse regression to convert the original high-dimensional problem to a low-dimensional one, which is then efficiently solved by building a response surface for the reduced model, for example via the polynomial chaos expansion. The first algorithm, which is for the situations where an exact SDR subspace exists, is proved to converge at rate O (n-1), hence much faster than MC. The second algorithm, which doesn't require an exact SDR, employs the reduced model as a control variate to reduce the error of the MC estimate. The accuracy gain could still be significant, depending on how well the reduced model approximates the original high-dimensional one. IRUQ also provides several additional practical advantages: it is non-intrusive; it does not require computing the high-dimensional gradient of the QoI; and it reports an error bar so the user knows how reliable the result is.
Blind source separation and localization using microphone arrays

NASA Astrophysics Data System (ADS)

Sun, Longji

The blind source separation and localization problem for audio signals is studied using microphone arrays. Pure delay mixtures of source signals typically encountered in outdoor environments are considered. Our proposed approach utilizes the subspace methods, including multiple signal classification (MUSIC) and estimation of signal parameters via rotational invariance techniques (ESPRIT) algorithms, to estimate the directions of arrival (DOAs) of the sources from the collected mixtures. Since audio signals are generally considered broadband, the DOA estimates at frequencies with the large sum of squared amplitude values are combined to obtain the final DOA estimates. Using the estimated DOAs, the corresponding mixing and demixing matrices are computed, and the source signals are recovered using the inverse short time Fourier transform. Subspace methods take advantage of the spatial covariance matrix of the collected mixtures to achieve robustness to noise. While the subspace methods have been studied for localizing radio frequency signals, audio signals have their special properties. For instance, they are nonstationary, naturally broadband and analog. All of these make the separation and localization for the audio signals more challenging. Moreover, our algorithm is essentially equivalent to the beamforming technique, which suppresses the signals in unwanted directions and only recovers the signals in the estimated DOAs. Several crucial issues related to our algorithm and their solutions have been discussed, including source number estimation, spatial aliasing, artifact filtering, different ways of mixture generation, and source coordinate estimation using multiple arrays. Additionally, comprehensive simulations and experiments have been conducted to examine various aspects of the algorithm. Unlike the existing blind source separation and localization methods, which are generally time consuming, our algorithm needs signal mixtures of only a short duration and therefore supports real-time implementation.
Randomized subspace-based robust principal component analysis for hyperspectral anomaly detection

NASA Astrophysics Data System (ADS)

Sun, Weiwei; Yang, Gang; Li, Jialin; Zhang, Dianfa

2018-01-01

A randomized subspace-based robust principal component analysis (RSRPCA) method for anomaly detection in hyperspectral imagery (HSI) is proposed. The RSRPCA combines advantages of randomized column subspace and robust principal component analysis (RPCA). It assumes that the background has low-rank properties, and the anomalies are sparse and do not lie in the column subspace of the background. First, RSRPCA implements random sampling to sketch the original HSI dataset from columns and to construct a randomized column subspace of the background. Structured random projections are also adopted to sketch the HSI dataset from rows. Sketching from columns and rows could greatly reduce the computational requirements of RSRPCA. Second, the RSRPCA adopts the columnwise RPCA (CWRPCA) to eliminate negative effects of sampled anomaly pixels and that purifies the previous randomized column subspace by removing sampled anomaly columns. The CWRPCA decomposes the submatrix of the HSI data into a low-rank matrix (i.e., background component), a noisy matrix (i.e., noise component), and a sparse anomaly matrix (i.e., anomaly component) with only a small proportion of nonzero columns. The algorithm of inexact augmented Lagrange multiplier is utilized to optimize the CWRPCA problem and estimate the sparse matrix. Nonzero columns of the sparse anomaly matrix point to sampled anomaly columns in the submatrix. Third, all the pixels are projected onto the complemental subspace of the purified randomized column subspace of the background and the anomaly pixels in the original HSI data are finally exactly located. Several experiments on three real hyperspectral images are carefully designed to investigate the detection performance of RSRPCA, and the results are compared with four state-of-the-art methods. Experimental results show that the proposed RSRPCA outperforms four comparison methods both in detection performance and in computational time.
Subspace methods for identification of human ankle joint stiffness.

PubMed

Zhao, Y; Westwick, D T; Kearney, R E

2011-11-01

Joint stiffness, the dynamic relationship between the angular position of a joint and the torque acting about it, describes the dynamic, mechanical behavior of a joint during posture and movement. Joint stiffness arises from both intrinsic and reflex mechanisms, but the torques due to these mechanisms cannot be measured separately experimentally, since they appear and change together. Therefore, the direct estimation of the intrinsic and reflex stiffnesses is difficult. In this paper, we present a new, two-step procedure to estimate the intrinsic and reflex components of ankle stiffness. In the first step, a discrete-time, subspace-based method is used to estimate a state-space model for overall stiffness from the measured overall torque and then predict the intrinsic and reflex torques. In the second step, continuous-time models for the intrinsic and reflex stiffnesses are estimated from the predicted intrinsic and reflex torques. Simulations and experimental results demonstrate that the algorithm estimates the intrinsic and reflex stiffnesses accurately. The new subspace-based algorithm has three advantages over previous algorithms: 1) It does not require iteration, and therefore, will always converge to an optimal solution; 2) it provides better estimates for data with high noise or short sample lengths; and 3) it provides much more accurate results for data acquired under the closed-loop conditions, that prevail when subjects interact with compliant loads.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

NASA Astrophysics Data System (ADS)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
Geometric MCMC for infinite-dimensional inverse problems

NASA Astrophysics Data System (ADS)

Beskos, Alexandros; Girolami, Mark; Lan, Shiwei; Farrell, Patrick E.; Stuart, Andrew M.

2017-04-01

Bayesian inverse problems often involve sampling posterior distributions on infinite-dimensional function spaces. Traditional Markov chain Monte Carlo (MCMC) algorithms are characterized by deteriorating mixing times upon mesh-refinement, when the finite-dimensional approximations become more accurate. Such methods are typically forced to reduce step-sizes as the discretization gets finer, and thus are expensive as a function of dimension. Recently, a new class of MCMC methods with mesh-independent convergence times has emerged. However, few of them take into account the geometry of the posterior informed by the data. At the same time, recently developed geometric MCMC algorithms have been found to be powerful in exploring complicated distributions that deviate significantly from elliptic Gaussian laws, but are in general computationally intractable for models defined in infinite dimensions. In this work, we combine geometric methods on a finite-dimensional subspace with mesh-independent infinite-dimensional approaches. Our objective is to speed up MCMC mixing times, without significantly increasing the computational cost per step (for instance, in comparison with the vanilla preconditioned Crank-Nicolson (pCN) method). This is achieved by using ideas from geometric MCMC to probe the complex structure of an intrinsic finite-dimensional subspace where most data information concentrates, while retaining robust mixing times as the dimension grows by using pCN-like methods in the complementary subspace. The resulting algorithms are demonstrated in the context of three challenging inverse problems arising in subsurface flow, heat conduction and incompressible flow control. The algorithms exhibit up to two orders of magnitude improvement in sampling efficiency when compared with the pCN method.
Hyperspectral material identification on radiance data using single-atmosphere or multiple-atmosphere modeling

NASA Astrophysics Data System (ADS)

Mariano, Adrian V.; Grossmann, John M.

2010-11-01

Reflectance-domain methods convert hyperspectral data from radiance to reflectance using an atmospheric compensation model. Material detection and identification are performed by comparing the compensated data to target reflectance spectra. We introduce two radiance-domain approaches, Single atmosphere Adaptive Cosine Estimator (SACE) and Multiple atmosphere ACE (MACE) in which the target reflectance spectra are instead converted into sensor-reaching radiance using physics-based models. For SACE, known illumination and atmospheric conditions are incorporated in a single atmospheric model. For MACE the conditions are unknown so the algorithm uses many atmospheric models to cover the range of environmental variability, and it approximates the result using a subspace model. This approach is sometimes called the invariant method, and requires the choice of a subspace dimension for the model. We compare these two radiance-domain approaches to a Reflectance-domain ACE (RACE) approach on a HYDICE image featuring concealed materials. All three algorithms use the ACE detector, and all three techniques are able to detect most of the hidden materials in the imagery. For MACE we observe a strong dependence on the choice of the material subspace dimension. Increasing this value can lead to a decline in performance.
Wavelet subspace decomposition of thermal infrared images for defect detection in artworks

NASA Astrophysics Data System (ADS)

Ahmad, M. Z.; Khan, A. A.; Mezghani, S.; Perrin, E.; Mouhoubi, K.; Bodnar, J. L.; Vrabie, V.

2016-07-01

Health of ancient artworks must be routinely monitored for their adequate preservation. Faults in these artworks may develop over time and must be identified as precisely as possible. The classical acoustic testing techniques, being invasive, risk causing permanent damage during periodic inspections. Infrared thermometry offers a promising solution to map faults in artworks. It involves heating the artwork and recording its thermal response using infrared camera. A novel strategy based on pseudo-random binary excitation principle is used in this work to suppress the risks associated with prolonged heating. The objective of this work is to develop an automatic scheme for detecting faults in the captured images. An efficient scheme based on wavelet based subspace decomposition is developed which favors identification of, the otherwise invisible, weaker faults. Two major problems addressed in this work are the selection of the optimal wavelet basis and the subspace level selection. A novel criterion based on regional mutual information is proposed for the latter. The approach is successfully tested on a laboratory based sample as well as real artworks. A new contrast enhancement metric is developed to demonstrate the quantitative efficiency of the algorithm. The algorithm is successfully deployed for both laboratory based and real artworks.
Robust Adaptive Beamforming with Sensor Position Errors Using Weighted Subspace Fitting-Based Covariance Matrix Reconstruction.

PubMed

Chen, Peng; Yang, Yixin; Wang, Yong; Ma, Yuanliang

2018-05-08

When sensor position errors exist, the performance of recently proposed interference-plus-noise covariance matrix (INCM)-based adaptive beamformers may be severely degraded. In this paper, we propose a weighted subspace fitting-based INCM reconstruction algorithm to overcome sensor displacement for linear arrays. By estimating the rough signal directions, we construct a novel possible mismatched steering vector (SV) set. We analyze the proximity of the signal subspace from the sample covariance matrix (SCM) and the space spanned by the possible mismatched SV set. After solving an iterative optimization problem, we reconstruct the INCM using the estimated sensor position errors. Then we estimate the SV of the desired signal by solving an optimization problem with the reconstructed INCM. The main advantage of the proposed algorithm is its robustness against SV mismatches dominated by unknown sensor position errors. Numerical examples show that even if the position errors are up to half of the assumed sensor spacing, the output signal-to-interference-plus-noise ratio is only reduced by 4 dB. Beam patterns plotted using experiment data show that the interference suppression capability of the proposed beamformer outperforms other tested beamformers.
A real negative selection algorithm with evolutionary preference for anomaly detection

NASA Astrophysics Data System (ADS)

Yang, Tao; Chen, Wen; Li, Tao

2017-04-01

Traditional real negative selection algorithms (RNSAs) adopt the estimated coverage (c0) as the algorithm termination threshold, and generate detectors randomly. With increasing dimensions, the data samples could reside in the low-dimensional subspace, so that the traditional detectors cannot effectively distinguish these samples. Furthermore, in high-dimensional feature space, c0 cannot exactly reflect the detectors set coverage rate for the nonself space, and it could lead the algorithm to be terminated unexpectedly when the number of detectors is insufficient. These shortcomings make the traditional RNSAs to perform poorly in high-dimensional feature space. Based upon "evolutionary preference" theory in immunology, this paper presents a real negative selection algorithm with evolutionary preference (RNSAP). RNSAP utilizes the "unknown nonself space", "low-dimensional target subspace" and "known nonself feature" as the evolutionary preference to guide the generation of detectors, thus ensuring the detectors can cover the nonself space more effectively. Besides, RNSAP uses redundancy to replace c0 as the termination threshold, in this way RNSAP can generate adequate detectors under a proper convergence rate. The theoretical analysis and experimental result demonstrate that, compared to the classical RNSA (V-detector), RNSAP can achieve a higher detection rate, but with less detectors and computing cost.
Detecting brain dynamics during resting state: a tensor based evolutionary clustering approach

NASA Astrophysics Data System (ADS)

Al-sharoa, Esraa; Al-khassaweneh, Mahmood; Aviyente, Selin

2017-08-01

Human brain is a complex network with connections across different regions. Understanding the functional connectivity (FC) of the brain is important both during resting state and task; as disruptions in connectivity patterns are indicators of different psychopathological and neurological diseases. In this work, we study the resting state functional connectivity networks (FCNs) of the brain from fMRI BOLD signals. Recent studies have shown that FCNs are dynamic even during resting state and understanding the temporal dynamics of FCNs is important for differentiating between different conditions. Therefore, it is important to develop algorithms to track the dynamic formation and dissociation of FCNs of the brain during resting state. In this paper, we propose a two step tensor based community detection algorithm to identify and track the brain network community structure across time. First, we introduce an information-theoretic function to reduce the dynamic FCN and identify the time points that are similar topologically to combine them into a tensor. These time points will be used to identify the different FC states. Second, a tensor based spectral clustering approach is developed to identify the community structure of the constructed tensors. The proposed algorithm applies Tucker decomposition to the constructed tensors and extract the orthogonal factor matrices along the connectivity mode to determine the common subspace within each FC state. The detected community structure is summarized and described as FC states. The results illustrate the dynamic structure of resting state networks (RSNs), including the default mode network, somatomotor network, subcortical network and visual network.
A Composite Algorithm for Mixed Integer Constrained Nonlinear Optimization.

DTIC Science & Technology

1980-01-01

de Silva [141, and Weisman and Wood [76). A particular direct search algorithm, the simplex method, has been cited for having the potential for...spaced discrete points on a line which makes the direction suitable for an efficient integer search technique based on Fibonacci numbers. Two...defined by a subset of variables. The complex algorithm is particularly well suited for this subspace search for two reasons. First, the complex method
Development of Parallel Architectures for Sensor Array Processing. Volume 1

DTIC Science & Technology

1993-08-01

required for the DOA estimation [ 1-7]. The Multiple Signal Classification ( MUSIC ) [ 1] and the Estimation of Signal Parameters by Rotational...manifold and the estimated subspace. Although MUSIC is a high resolution algorithm, it has several drawbacks including the fact that complete knowledge of...thoroughly, MUSIC algorithm was selected to develop special purpose hardware for real time computation. Summary of the MUSIC algorithm is as follows
Beamspace dual signal space projection (bDSSP): a method for selective detection of deep sources in MEG measurements.

PubMed

Sekihara, Kensuke; Adachi, Yoshiaki; Kubota, Hiroshi K; Cai, Chang; Nagarajan, Srikantan S

2018-06-01

Magnetoencephalography (MEG) has a well-recognized weakness at detecting deeper brain activities. This paper proposes a novel algorithm for selective detection of deep sources by suppressing interference signals from superficial sources in MEG measurements. The proposed algorithm combines the beamspace preprocessing method with the dual signal space projection (DSSP) interference suppression method. A prerequisite of the proposed algorithm is prior knowledge of the location of the deep sources. The proposed algorithm first derives the basis vectors that span a local region just covering the locations of the deep sources. It then estimates the time-domain signal subspace of the superficial sources by using the projector composed of these basis vectors. Signals from the deep sources are extracted by projecting the row space of the data matrix onto the direction orthogonal to the signal subspace of the superficial sources. Compared with the previously proposed beamspace signal space separation (SSS) method, the proposed algorithm is capable of suppressing much stronger interference from superficial sources. This capability is demonstrated in our computer simulation as well as experiments using phantom data. The proposed bDSSP algorithm can be a powerful tool in studies of physiological functions of midbrain and deep brain structures.
Beamspace dual signal space projection (bDSSP): a method for selective detection of deep sources in MEG measurements

NASA Astrophysics Data System (ADS)

Sekihara, Kensuke; Adachi, Yoshiaki; Kubota, Hiroshi K.; Cai, Chang; Nagarajan, Srikantan S.

2018-06-01

Objective. Magnetoencephalography (MEG) has a well-recognized weakness at detecting deeper brain activities. This paper proposes a novel algorithm for selective detection of deep sources by suppressing interference signals from superficial sources in MEG measurements. Approach. The proposed algorithm combines the beamspace preprocessing method with the dual signal space projection (DSSP) interference suppression method. A prerequisite of the proposed algorithm is prior knowledge of the location of the deep sources. The proposed algorithm first derives the basis vectors that span a local region just covering the locations of the deep sources. It then estimates the time-domain signal subspace of the superficial sources by using the projector composed of these basis vectors. Signals from the deep sources are extracted by projecting the row space of the data matrix onto the direction orthogonal to the signal subspace of the superficial sources. Main results. Compared with the previously proposed beamspace signal space separation (SSS) method, the proposed algorithm is capable of suppressing much stronger interference from superficial sources. This capability is demonstrated in our computer simulation as well as experiments using phantom data. Significance. The proposed bDSSP algorithm can be a powerful tool in studies of physiological functions of midbrain and deep brain structures.

Generalizing the self-healing diffusion Monte Carlo approach to finite temperature: a path for the optimization of low-energy many-body bases.

PubMed

Reboredo, Fernando A; Kim, Jeongnim

2014-02-21

A statistical method is derived for the calculation of thermodynamic properties of many-body systems at low temperatures. This method is based on the self-healing diffusion Monte Carlo method for complex functions [F. A. Reboredo, J. Chem. Phys. 136, 204101 (2012)] and some ideas of the correlation function Monte Carlo approach [D. M. Ceperley and B. Bernu, J. Chem. Phys. 89, 6316 (1988)]. In order to allow the evolution in imaginary time to describe the density matrix, we remove the fixed-node restriction using complex antisymmetric guiding wave functions. In the process we obtain a parallel algorithm that optimizes a small subspace of the many-body Hilbert space to provide maximum overlap with the subspace spanned by the lowest-energy eigenstates of a many-body Hamiltonian. We show in a model system that the partition function is progressively maximized within this subspace. We show that the subspace spanned by the small basis systematically converges towards the subspace spanned by the lowest energy eigenstates. Possible applications of this method for calculating the thermodynamic properties of many-body systems near the ground state are discussed. The resulting basis can also be used to accelerate the calculation of the ground or excited states with quantum Monte Carlo.
Generalizing the self-healing diffusion Monte Carlo approach to finite temperature: A path for the optimization of low-energy many-body bases

NASA Astrophysics Data System (ADS)

Reboredo, Fernando A.; Kim, Jeongnim

2014-02-01

A statistical method is derived for the calculation of thermodynamic properties of many-body systems at low temperatures. This method is based on the self-healing diffusion Monte Carlo method for complex functions [F. A. Reboredo, J. Chem. Phys. 136, 204101 (2012)] and some ideas of the correlation function Monte Carlo approach [D. M. Ceperley and B. Bernu, J. Chem. Phys. 89, 6316 (1988)]. In order to allow the evolution in imaginary time to describe the density matrix, we remove the fixed-node restriction using complex antisymmetric guiding wave functions. In the process we obtain a parallel algorithm that optimizes a small subspace of the many-body Hilbert space to provide maximum overlap with the subspace spanned by the lowest-energy eigenstates of a many-body Hamiltonian. We show in a model system that the partition function is progressively maximized within this subspace. We show that the subspace spanned by the small basis systematically converges towards the subspace spanned by the lowest energy eigenstates. Possible applications of this method for calculating the thermodynamic properties of many-body systems near the ground state are discussed. The resulting basis can also be used to accelerate the calculation of the ground or excited states with quantum Monte Carlo.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions

NASA Astrophysics Data System (ADS)

Chen, Nan; Majda, Andrew J.

2018-02-01

Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE PAGES

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms

NASA Astrophysics Data System (ADS)

Berkson, Emily E.; Messinger, David W.

2016-05-01

Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
Discrete Optimization of Electronic Hyperpolarizabilities in a Chemical Subspace

DTIC Science & Technology

2009-05-01

molecular design. Methods for optimization in discrete spaces have been studied extensively and recently reviewed ( 5). Optimization methods include...integer programming, as in branch-and-bound techniques (including dead-end elimination [ 6]), simulated annealing ( 7), and genetic algorithms ( 8...These algorithms have found renewed interest and application in molecular and materials design (9- 12) . Recently, new approaches have been
A model reduction approach to numerical inversion for a parabolic partial differential equation

NASA Astrophysics Data System (ADS)

Borcea, Liliana; Druskin, Vladimir; Mamonov, Alexander V.; Zaslavsky, Mikhail

2014-12-01

We propose a novel numerical inversion algorithm for the coefficients of parabolic partial differential equations, based on model reduction. The study is motivated by the application of controlled source electromagnetic exploration, where the unknown is the subsurface electrical resistivity and the data are time resolved surface measurements of the magnetic field. The algorithm presented in this paper considers inversion in one and two dimensions. The reduced model is obtained with rational interpolation in the frequency (Laplace) domain and a rational Krylov subspace projection method. It amounts to a nonlinear mapping from the function space of the unknown resistivity to the small dimensional space of the parameters of the reduced model. We use this mapping as a nonlinear preconditioner for the Gauss-Newton iterative solution of the inverse problem. The advantage of the inversion algorithm is twofold. First, the nonlinear preconditioner resolves most of the nonlinearity of the problem. Thus the iterations are less likely to get stuck in local minima and the convergence is fast. Second, the inversion is computationally efficient because it avoids repeated accurate simulations of the time-domain response. We study the stability of the inversion algorithm for various rational Krylov subspaces, and assess its performance with numerical experiments.
Subspace techniques to remove artifacts from EEG: a quantitative analysis.

PubMed

Teixeira, A R; Tome, A M; Lang, E W; Martins da Silva, A

2008-01-01

In this work we discuss and apply projective subspace techniques to both multichannel as well as single channel recordings. The single-channel approach is based on singular spectrum analysis(SSA) and the multichannel approach uses the extended infomax algorithm which is implemented in the opensource toolbox EEGLAB. Both approaches will be evaluated using artificial mixtures of a set of selected EEG signals. The latter were selected visually to contain as the dominant activity one of the characteristic bands of an electroencephalogram (EEG). The evaluation is performed both in the time and frequency domain by using correlation coefficients and coherence function, respectively.
Scalable Robust Principal Component Analysis Using Grassmann Averages.

PubMed

Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J

2016-11-01

In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
Component-based subspace linear discriminant analysis method for face recognition with one training sample

NASA Astrophysics Data System (ADS)

Huang, Jian; Yuen, Pong C.; Chen, Wen-Sheng; Lai, J. H.

2005-05-01

Many face recognition algorithms/systems have been developed in the last decade and excellent performances have also been reported when there is a sufficient number of representative training samples. In many real-life applications such as passport identification, only one well-controlled frontal sample image is available for training. Under this situation, the performance of existing algorithms will degrade dramatically or may not even be implemented. We propose a component-based linear discriminant analysis (LDA) method to solve the one training sample problem. The basic idea of the proposed method is to construct local facial feature component bunches by moving each local feature region in four directions. In this way, we not only generate more samples with lower dimension than the original image, but also consider the face detection localization error while training. After that, we propose a subspace LDA method, which is tailor-made for a small number of training samples, for the local feature projection to maximize the discrimination power. Theoretical analysis and experiment results show that our proposed subspace LDA is efficient and overcomes the limitations in existing LDA methods. Finally, we combine the contributions of each local component bunch with a weighted combination scheme to draw the recognition decision. A FERET database is used for evaluating the proposed method and results are encouraging.
Noise covariance incorporated MEG-MUSIC algorithm: a method for multiple-dipole estimation tolerant of the influence of background brain activity.

PubMed

Sekihara, K; Poeppel, D; Marantz, A; Koizumi, H; Miyashita, Y

1997-09-01

This paper proposes a method of localizing multiple current dipoles from spatio-temporal biomagnetic data. The method is based on the multiple signal classification (MUSIC) algorithm and is tolerant of the influence of background brain activity. In this method, the noise covariance matrix is estimated using a portion of the data that contains noise, but does not contain any signal information. Then, a modified noise subspace projector is formed using the generalized eigenvectors of the noise and measured-data covariance matrices. The MUSIC localizer is calculated using this noise subspace projector and the noise covariance matrix. The results from a computer simulation have verified the effectiveness of the method. The method was then applied to source estimation for auditory-evoked fields elicited by syllable speech sounds. The results strongly suggest the method's effectiveness in removing the influence of background activity.
An Intelligent Architecture Based on Field Programmable Gate Arrays Designed to Detect Moving Objects by Using Principal Component Analysis

PubMed Central

Bravo, Ignacio; Mazo, Manuel; Lázaro, José L.; Gardel, Alfredo; Jiménez, Pedro; Pizarro, Daniel

2010-01-01

This paper presents a complete implementation of the Principal Component Analysis (PCA) algorithm in Field Programmable Gate Array (FPGA) devices applied to high rate background segmentation of images. The classical sequential execution of different parts of the PCA algorithm has been parallelized. This parallelization has led to the specific development and implementation in hardware of the different stages of PCA, such as computation of the correlation matrix, matrix diagonalization using the Jacobi method and subspace projections of images. On the application side, the paper presents a motion detection algorithm, also entirely implemented on the FPGA, and based on the developed PCA core. This consists of dynamically thresholding the differences between the input image and the one obtained by expressing the input image using the PCA linear subspace previously obtained as a background model. The proposal achieves a high ratio of processed images (up to 120 frames per second) and high quality segmentation results, with a completely embedded and reliable hardware architecture based on commercial CMOS sensors and FPGA devices. PMID:22163406
An intelligent architecture based on Field Programmable Gate Arrays designed to detect moving objects by using Principal Component Analysis.

PubMed

Bravo, Ignacio; Mazo, Manuel; Lázaro, José L; Gardel, Alfredo; Jiménez, Pedro; Pizarro, Daniel

2010-01-01

This paper presents a complete implementation of the Principal Component Analysis (PCA) algorithm in Field Programmable Gate Array (FPGA) devices applied to high rate background segmentation of images. The classical sequential execution of different parts of the PCA algorithm has been parallelized. This parallelization has led to the specific development and implementation in hardware of the different stages of PCA, such as computation of the correlation matrix, matrix diagonalization using the Jacobi method and subspace projections of images. On the application side, the paper presents a motion detection algorithm, also entirely implemented on the FPGA, and based on the developed PCA core. This consists of dynamically thresholding the differences between the input image and the one obtained by expressing the input image using the PCA linear subspace previously obtained as a background model. The proposal achieves a high ratio of processed images (up to 120 frames per second) and high quality segmentation results, with a completely embedded and reliable hardware architecture based on commercial CMOS sensors and FPGA devices.
Imaging of downward-looking linear array SAR using three-dimensional spatial smoothing MUSIC algorithm

NASA Astrophysics Data System (ADS)

Zhang, Siqian; Kuang, Gangyao

2014-10-01

In this paper, a novel three-dimensional imaging algorithm of downward-looking linear array SAR is presented. To improve the resolution, multiple signal classification (MUSIC) algorithm has been used. However, since the scattering centers are always correlated in real SAR system, the estimated covariance matrix becomes singular. To address the problem, a three-dimensional spatial smoothing method is proposed in this paper to restore the singular covariance matrix to a full-rank one. The three-dimensional signal matrix can be divided into a set of orthogonal three-dimensional subspaces. The main idea of the method is based on extracting the array correlation matrix as the average of all correlation matrices from the subspaces. In addition, the spectral height of the peaks contains no information with regard to the scattering intensity of the different scattering centers, thus it is difficulty to reconstruct the backscattering information. The least square strategy is used to estimate the amplitude of the scattering center in this paper. The above results of the theoretical analysis are verified by 3-D scene simulations and experiments on real data.
Constrained Low-Rank Learning Using Least Squares-Based Regularization.

PubMed

Li, Ping; Yu, Jun; Wang, Meng; Zhang, Luming; Cai, Deng; Li, Xuelong

2017-12-01

Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.
Self-aggregation in scaled principal component space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Chris H.Q.; He, Xiaofeng; Zha, Hongyuan

2001-10-05

Automatic grouping of voluminous data into meaningful structures is a challenging task frequently encountered in broad areas of science, engineering and information processing. These data clustering tasks are frequently performed in Euclidean space or a subspace chosen from principal component analysis (PCA). Here we describe a space obtained by a nonlinear scaling of PCA in which data objects self-aggregate automatically into clusters. Projection into this space gives sharp distinctions among clusters. Gene expression profiles of cancer tissue subtypes, Web hyperlink structure and Internet newsgroups are analyzed to illustrate interesting properties of the space.
Scalable Methods for Uncertainty Quantification, Data Assimilation and Target Accuracy Assessment for Multi-Physics Advanced Simulation of Light Water Reactors

NASA Astrophysics Data System (ADS)

Khuwaileh, Bassam

High fidelity simulation of nuclear reactors entails large scale applications characterized with high dimensionality and tremendous complexity where various physics models are integrated in the form of coupled models (e.g. neutronic with thermal-hydraulic feedback). Each of the coupled modules represents a high fidelity formulation of the first principles governing the physics of interest. Therefore, new developments in high fidelity multi-physics simulation and the corresponding sensitivity/uncertainty quantification analysis are paramount to the development and competitiveness of reactors achieved through enhanced understanding of the design and safety margins. Accordingly, this dissertation introduces efficient and scalable algorithms for performing efficient Uncertainty Quantification (UQ), Data Assimilation (DA) and Target Accuracy Assessment (TAA) for large scale, multi-physics reactor design and safety problems. This dissertation builds upon previous efforts for adaptive core simulation and reduced order modeling algorithms and extends these efforts towards coupled multi-physics models with feedback. The core idea is to recast the reactor physics analysis in terms of reduced order models. This can be achieved via identifying the important/influential degrees of freedom (DoF) via the subspace analysis, such that the required analysis can be recast by considering the important DoF only. In this dissertation, efficient algorithms for lower dimensional subspace construction have been developed for single physics and multi-physics applications with feedback. Then the reduced subspace is used to solve realistic, large scale forward (UQ) and inverse problems (DA and TAA). Once the elite set of DoF is determined, the uncertainty/sensitivity/target accuracy assessment and data assimilation analysis can be performed accurately and efficiently for large scale, high dimensional multi-physics nuclear engineering applications. Hence, in this work a Karhunen-Loeve (KL) based algorithm previously developed to quantify the uncertainty for single physics models is extended for large scale multi-physics coupled problems with feedback effect. Moreover, a non-linear surrogate based UQ approach is developed, used and compared to performance of the KL approach and brute force Monte Carlo (MC) approach. On the other hand, an efficient Data Assimilation (DA) algorithm is developed to assess information about model's parameters: nuclear data cross-sections and thermal-hydraulics parameters. Two improvements are introduced in order to perform DA on the high dimensional problems. First, a goal-oriented surrogate model can be used to replace the original models in the depletion sequence (MPACT -- COBRA-TF - ORIGEN). Second, approximating the complex and high dimensional solution space with a lower dimensional subspace makes the sampling process necessary for DA possible for high dimensional problems. Moreover, safety analysis and design optimization depend on the accurate prediction of various reactor attributes. Predictions can be enhanced by reducing the uncertainty associated with the attributes of interest. Accordingly, an inverse problem can be defined and solved to assess the contributions from sources of uncertainty; and experimental effort can be subsequently directed to further improve the uncertainty associated with these sources. In this dissertation a subspace-based gradient-free and nonlinear algorithm for inverse uncertainty quantification namely the Target Accuracy Assessment (TAA) has been developed and tested. The ideas proposed in this dissertation were first validated using lattice physics applications simulated using SCALE6.1 package (Pressurized Water Reactor (PWR) and Boiling Water Reactor (BWR) lattice models). Ultimately, the algorithms proposed her were applied to perform UQ and DA for assembly level (CASL progression problem number 6) and core wide problems representing Watts Bar Nuclear 1 (WBN1) for cycle 1 of depletion (CASL Progression Problem Number 9) modeled via simulated using VERA-CS which consists of several multi-physics coupled models. The analysis and algorithms developed in this dissertation were encoded and implemented in a newly developed tool kit algorithms for Reduced Order Modeling based Uncertainty/Sensitivity Estimator (ROMUSE).
Density-matrix-based algorithm for solving eigenvalue problems

NASA Astrophysics Data System (ADS)

Polizzi, Eric

2009-03-01

A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.
A robust variant of block Jacobi-Davidson for extracting a large number of eigenpairs: Application to grid-based real-space density functional theory

NASA Astrophysics Data System (ADS)

Lee, M.; Leiter, K.; Eisner, C.; Breuer, A.; Wang, X.

2017-09-01

In this work, we investigate a block Jacobi-Davidson (J-D) variant suitable for sparse symmetric eigenproblems where a substantial number of extremal eigenvalues are desired (e.g., ground-state real-space quantum chemistry). Most J-D algorithm variations tend to slow down as the number of desired eigenpairs increases due to frequent orthogonalization against a growing list of solved eigenvectors. In our specification of block J-D, all of the steps of the algorithm are performed in clusters, including the linear solves, which allows us to greatly reduce computational effort with blocked matrix-vector multiplies. In addition, we move orthogonalization against locked eigenvectors and working eigenvectors outside of the inner loop but retain the single Ritz vector projection corresponding to the index of the correction vector. Furthermore, we minimize the computational effort by constraining the working subspace to the current vectors being updated and the latest set of corresponding correction vectors. Finally, we incorporate accuracy thresholds based on the precision required by the Fermi-Dirac distribution. The net result is a significant reduction in the computational effort against most previous block J-D implementations, especially as the number of wanted eigenpairs grows. We compare our approach with another robust implementation of block J-D (JDQMR) and the state-of-the-art Chebyshev filter subspace (CheFSI) method for various real-space density functional theory systems. Versus CheFSI, for first-row elements, our method yields competitive timings for valence-only systems and 4-6× speedups for all-electron systems with up to 10× reduced matrix-vector multiplies. For all-electron calculations on larger elements (e.g., gold) where the wanted spectrum is quite narrow compared to the full spectrum, we observe 60× speedup with 200× fewer matrix-vector multiples vs. CheFSI.

A robust variant of block Jacobi-Davidson for extracting a large number of eigenpairs: Application to grid-based real-space density functional theory.

PubMed

Lee, M; Leiter, K; Eisner, C; Breuer, A; Wang, X

2017-09-21

In this work, we investigate a block Jacobi-Davidson (J-D) variant suitable for sparse symmetric eigenproblems where a substantial number of extremal eigenvalues are desired (e.g., ground-state real-space quantum chemistry). Most J-D algorithm variations tend to slow down as the number of desired eigenpairs increases due to frequent orthogonalization against a growing list of solved eigenvectors. In our specification of block J-D, all of the steps of the algorithm are performed in clusters, including the linear solves, which allows us to greatly reduce computational effort with blocked matrix-vector multiplies. In addition, we move orthogonalization against locked eigenvectors and working eigenvectors outside of the inner loop but retain the single Ritz vector projection corresponding to the index of the correction vector. Furthermore, we minimize the computational effort by constraining the working subspace to the current vectors being updated and the latest set of corresponding correction vectors. Finally, we incorporate accuracy thresholds based on the precision required by the Fermi-Dirac distribution. The net result is a significant reduction in the computational effort against most previous block J-D implementations, especially as the number of wanted eigenpairs grows. We compare our approach with another robust implementation of block J-D (JDQMR) and the state-of-the-art Chebyshev filter subspace (CheFSI) method for various real-space density functional theory systems. Versus CheFSI, for first-row elements, our method yields competitive timings for valence-only systems and 4-6× speedups for all-electron systems with up to 10× reduced matrix-vector multiplies. For all-electron calculations on larger elements (e.g., gold) where the wanted spectrum is quite narrow compared to the full spectrum, we observe 60× speedup with 200× fewer matrix-vector multiples vs. CheFSI.
An adaptive optimal control for smart structures based on the subspace tracking identification technique

NASA Astrophysics Data System (ADS)

Ripamonti, Francesco; Resta, Ferruccio; Borroni, Massimo; Cazzulani, Gabriele

2014-04-01

A new method for the real-time identification of mechanical system modal parameters is used in order to design different adaptive control logics aiming to reduce the vibrations in a carbon fiber plate smart structure. It is instrumented with three piezoelectric actuators, three accelerometers and three strain gauges. The real-time identification is based on a recursive subspace tracking algorithm whose outputs are elaborated by an ARMA model. A statistical approach is finally applied to choose the modal parameter correct values. These are given in input to model-based control logics such as a gain scheduling and an adaptive LQR control.
Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction

NASA Astrophysics Data System (ADS)

Cui, Tiangang; Marzouk, Youssef; Willcox, Karen

2016-06-01

Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting-both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.
Coexistence of Antiferromagnetic and Spin Cluster Glass Order in the Magnetoelectric Relaxor Multiferroic PbFe0.5Nb0.5O3

NASA Astrophysics Data System (ADS)

Kleemann, W.; Shvartsman, V. V.; Borisov, P.; Kania, A.

2010-12-01

The coexistence of cluster glass with long-range antiferromagnetic order in the relaxor ferroelectric PbFe0.5Nb0.5O3 is elucidated. While the transition at TN=153K on the infinite antiferromagnetic cluster induces 3m symmetry with large EH2 magnetoelectric response, the disconnected subspace of isolated Fe3+ ions and finite clusters accommodates the cluster glass below Tg=10.6K with field-induced m' symmetry and EH-type magnetoelectric response. Critical slowing-down, memory and rejuvenation after aging, occurrence of a de Almeida-Thouless phase line, and stretched exponential relaxation of remanence corroborate the glass nature.
Parallel eigenanalysis of finite element models in a completely connected architecture

NASA Technical Reports Server (NTRS)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions

NASA Astrophysics Data System (ADS)

Chen, N.; Majda, A.

2017-12-01

Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method, which is based on an effective data assimilation framework, provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace. Therefore, it is computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from the traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has a significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O(100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
Reverse time migration by Krylov subspace reduced order modeling

NASA Astrophysics Data System (ADS)

Basir, Hadi Mahdavi; Javaherian, Abdolrahim; Shomali, Zaher Hossein; Firouz-Abadi, Roohollah Dehghani; Gholamy, Shaban Ali

2018-04-01

Imaging is a key step in seismic data processing. To date, a myriad of advanced pre-stack depth migration approaches have been developed; however, reverse time migration (RTM) is still considered as the high-end imaging algorithm. The main limitations associated with the performance cost of reverse time migration are the intensive computation of the forward and backward simulations, time consumption, and memory allocation related to imaging condition. Based on the reduced order modeling, we proposed an algorithm, which can be adapted to all the aforementioned factors. Our proposed method benefit from Krylov subspaces method to compute certain mode shapes of the velocity model computed by as an orthogonal base of reduced order modeling. Reverse time migration by reduced order modeling is helpful concerning the highly parallel computation and strongly reduces the memory requirement of reverse time migration. The synthetic model results showed that suggested method can decrease the computational costs of reverse time migration by several orders of magnitudes, compared with reverse time migration by finite element method.
Similarity preserving low-rank representation for enhanced data representation and effective subspace learning.

PubMed

Zhang, Zhao; Yan, Shuicheng; Zhao, Mingbo

2014-05-01

Latent Low-Rank Representation (LatLRR) delivers robust and promising results for subspace recovery and feature extraction through mining the so-called hidden effects, but the locality of both similar principal and salient features cannot be preserved in the optimizations. To solve this issue for achieving enhanced performance, a boosted version of LatLRR, referred to as Regularized Low-Rank Representation (rLRR), is proposed through explicitly including an appropriate Laplacian regularization that can maximally preserve the similarity among local features. Resembling LatLRR, rLRR decomposes given data matrix from two directions by seeking a pair of low-rank matrices. But the similarities of principal and salient features can be effectively preserved by rLRR. As a result, the correlated features are well grouped and the robustness of representations is also enhanced. Based on the outputted bi-directional low-rank codes by rLRR, an unsupervised subspace learning framework termed Low-rank Similarity Preserving Projections (LSPP) is also derived for feature learning. The supervised extension of LSPP is also discussed for discriminant subspace learning. The validity of rLRR is examined by robust representation and decomposition of real images. Results demonstrated the superiority of our rLRR and LSPP in comparison to other related state-of-the-art algorithms. Copyright © 2014 Elsevier Ltd. All rights reserved.
Low-rank plus sparse decomposition for exoplanet detection in direct-imaging ADI sequences. The LLSG algorithm

NASA Astrophysics Data System (ADS)

Gomez Gonzalez, C. A.; Absil, O.; Absil, P.-A.; Van Droogenbroeck, M.; Mawet, D.; Surdej, J.

2016-05-01

Context. Data processing constitutes a critical component of high-contrast exoplanet imaging. Its role is almost as important as the choice of a coronagraph or a wavefront control system, and it is intertwined with the chosen observing strategy. Among the data processing techniques for angular differential imaging (ADI), the most recent is the family of principal component analysis (PCA) based algorithms. It is a widely used statistical tool developed during the first half of the past century. PCA serves, in this case, as a subspace projection technique for constructing a reference point spread function (PSF) that can be subtracted from the science data for boosting the detectability of potential companions present in the data. Unfortunately, when building this reference PSF from the science data itself, PCA comes with certain limitations such as the sensitivity of the lower dimensional orthogonal subspace to non-Gaussian noise. Aims: Inspired by recent advances in machine learning algorithms such as robust PCA, we aim to propose a localized subspace projection technique that surpasses current PCA-based post-processing algorithms in terms of the detectability of companions at near real-time speed, a quality that will be useful for future direct imaging surveys. Methods: We used randomized low-rank approximation methods recently proposed in the machine learning literature, coupled with entry-wise thresholding to decompose an ADI image sequence locally into low-rank, sparse, and Gaussian noise components (LLSG). This local three-term decomposition separates the starlight and the associated speckle noise from the planetary signal, which mostly remains in the sparse term. We tested the performance of our new algorithm on a long ADI sequence obtained on β Pictoris with VLT/NACO. Results: Compared to a standard PCA approach, LLSG decomposition reaches a higher signal-to-noise ratio and has an overall better performance in the receiver operating characteristic space. This three-term decomposition brings a detectability boost compared to the full-frame standard PCA approach, especially in the small inner working angle region where complex speckle noise prevents PCA from discerning true companions from noise.
Systematic Dimensionality Reduction for Quantum Walks: Optimal Spatial Search and Transport on Non-Regular Graphs

PubMed Central

Novo, Leonardo; Chakraborty, Shantanav; Mohseni, Masoud; Neven, Hartmut; Omar, Yasser

2015-01-01

Continuous time quantum walks provide an important framework for designing new algorithms and modelling quantum transport and state transfer problems. Often, the graph representing the structure of a problem contains certain symmetries that confine the dynamics to a smaller subspace of the full Hilbert space. In this work, we use invariant subspace methods, that can be computed systematically using the Lanczos algorithm, to obtain the reduced set of states that encompass the dynamics of the problem at hand without the specific knowledge of underlying symmetries. First, we apply this method to obtain new instances of graphs where the spatial quantum search algorithm is optimal: complete graphs with broken links and complete bipartite graphs, in particular, the star graph. These examples show that regularity and high-connectivity are not needed to achieve optimal spatial search. We also show that this method considerably simplifies the calculation of quantum transport efficiencies. Furthermore, we observe improved efficiencies by removing a few links from highly symmetric graphs. Finally, we show that this reduction method also allows us to obtain an upper bound for the fidelity of a single qubit transfer on an XY spin network. PMID:26330082
Matrix Product Operator Simulations of Quantum Algorithms

DTIC Science & Technology

2015-02-01

parallel to the Grover subspace parametrically: (Zi|φ〉)‖ = s cos γ|α〉+ s sin γ|β〉, s = √ a(k)2 (N − 1)2 + b(k)2, γ = tan −1 ( b(k)(N − 1) a(k) ) (6.32) Each...of this vector parallel to the Grover subspace in parametric form: (XiZi|φ〉)‖ = s cos(γ)|α〉+ s sin(γ)|β〉, s = 1√ N − 1 , γ = tan −1 ( cot (( k + 1 2 ) θ...quant- ph/0001106, 2000. Bibliography 146 [30] Jérémie Roland and Nicolas J Cerf. Quantum search by local adiabatic evolution. Physical Review A, 65(4
Array magnetics modal analysis for the DIII-D tokamak based on localized time-series modelling

DOE PAGES

Olofsson, K. Erik J.; Hanson, Jeremy M.; Shiraki, Daisuke; ...

2014-07-14

Here, time-series analysis of magnetics data in tokamaks is typically done using block-based fast Fourier transform methods. This work presents the development and deployment of a new set of algorithms for magnetic probe array analysis. The method is based on an estimation technique known as stochastic subspace identification (SSI). Compared with the standard coherence approach or the direct singular value decomposition approach, the new technique exhibits several beneficial properties. For example, the SSI method does not require that frequencies are orthogonal with respect to the timeframe used in the analysis. Frequencies are obtained directly as parameters of localized time-series models.more » The parameters are extracted by solving small-scale eigenvalue problems. Applications include maximum-likelihood regularized eigenmode pattern estimation, detection of neoclassical tearing modes, including locked mode precursors, and automatic clustering of modes, and magnetics-pattern characterization of sawtooth pre- and postcursors, edge harmonic oscillations and fishbones.« less
Partially supervised speaker clustering.

PubMed

Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

2012-05-01

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Generalizing the self-healing diffusion Monte Carlo approach to finite temperature: a path for the optimization of low-energy many-body basis expansions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jeongnim; Reboredo, Fernando A.

The self-healing diffusion Monte Carlo method for complex functions [F. A. Reboredo J. Chem. Phys. {\\bf 136}, 204101 (2012)] and some ideas of the correlation function Monte Carlo approach [D. M. Ceperley and B. Bernu, J. Chem. Phys. {\\bf 89}, 6316 (1988)] are blended to obtain a method for the calculation of thermodynamic properties of many-body systems at low temperatures. In order to allow the evolution in imaginary time to describe the density matrix, we remove the fixed-node restriction using complex antisymmetric trial wave functions. A statistical method is derived for the calculation of finite temperature properties of many-body systemsmore » near the ground state. In the process we also obtain a parallel algorithm that optimizes the many-body basis of a small subspace of the many-body Hilbert space. This small subspace is optimized to have maximum overlap with the one expanded by the lower energy eigenstates of a many-body Hamiltonian. We show in a model system that the Helmholtz free energy is minimized within this subspace as the iteration number increases. We show that the subspace expanded by the small basis systematically converges towards the subspace expanded by the lowest energy eigenstates. Possible applications of this method to calculate the thermodynamic properties of many-body systems near the ground state are discussed. The resulting basis can be also used to accelerate the calculation of the ground or excited states with Quantum Monte Carlo.« less
Hyperspectral Super-Resolution of Locally Low Rank Images From Complementary Multisource Data.

PubMed

Veganzones, Miguel A; Simoes, Miguel; Licciardi, Giorgio; Yokoya, Naoto; Bioucas-Dias, Jose M; Chanussot, Jocelyn

2016-01-01

Remote sensing hyperspectral images (HSIs) are quite often low rank, in the sense that the data belong to a low dimensional subspace/manifold. This has been recently exploited for the fusion of low spatial resolution HSI with high spatial resolution multispectral images in order to obtain super-resolution HSI. Most approaches adopt an unmixing or a matrix factorization perspective. The derived methods have led to state-of-the-art results when the spectral information lies in a low-dimensional subspace/manifold. However, if the subspace/manifold dimensionality spanned by the complete data set is large, i.e., larger than the number of multispectral bands, the performance of these methods mainly decreases because the underlying sparse regression problem is severely ill-posed. In this paper, we propose a local approach to cope with this difficulty. Fundamentally, we exploit the fact that real world HSIs are locally low rank, that is, pixels acquired from a given spatial neighborhood span a very low-dimensional subspace/manifold, i.e., lower or equal than the number of multispectral bands. Thus, we propose to partition the image into patches and solve the data fusion problem independently for each patch. This way, in each patch the subspace/manifold dimensionality is low enough, such that the problem is not ill-posed anymore. We propose two alternative approaches to define the hyperspectral super-resolution through local dictionary learning using endmember induction algorithms. We also explore two alternatives to define the local regions, using sliding windows and binary partition trees. The effectiveness of the proposed approaches is illustrated with synthetic and semi real data.
CaSPIAN: A Causal Compressive Sensing Algorithm for Discovering Directed Interactions in Gene Networks

PubMed Central

Emad, Amin; Milenkovic, Olgica

2014-01-01

We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse “scaffold networks”, to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime. PMID:24622336
A unified classifier for robust face recognition based on combining multiple subspace algorithms

NASA Astrophysics Data System (ADS)

Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad

2012-10-01

Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.
The use of Lanczos's method to solve the large generalized symmetric definite eigenvalue problem

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.

1989-01-01

The generalized eigenvalue problem, Kx = Lambda Mx, is of significant practical importance, especially in structural enginering where it arises as the vibration and buckling problem. A new algorithm, LANZ, based on Lanczos's method is developed. LANZ uses a technique called dynamic shifting to improve the efficiency and reliability of the Lanczos algorithm. A new algorithm for solving the tridiagonal matrices that arise when using Lanczos's method is described. A modification of Parlett and Scott's selective orthogonalization algorithm is proposed. Results from an implementation of LANZ on a Convex C-220 show it to be superior to a subspace iteration code.
Faster than classical quantum algorithm for dense formulas of exact satisfiability and occupation problems

NASA Astrophysics Data System (ADS)

Mandrà, Salvatore; Giacomo Guerreschi, Gian; Aspuru-Guzik, Alán

2016-07-01

We present an exact quantum algorithm for solving the Exact Satisfiability problem, which belongs to the important NP-complete complexity class. The algorithm is based on an intuitive approach that can be divided into two parts: the first step consists in the identification and efficient characterization of a restricted subspace that contains all the valid assignments of the Exact Satisfiability; while the second part performs a quantum search in such restricted subspace. The quantum algorithm can be used either to find a valid assignment (or to certify that no solution exists) or to count the total number of valid assignments. The query complexities for the worst-case are respectively bounded by O(\\sqrt{{2}n-{M\\prime }}) and O({2}n-{M\\prime }), where n is the number of variables and {M}\\prime the number of linearly independent clauses. Remarkably, the proposed quantum algorithm results to be faster than any known exact classical algorithm to solve dense formulas of Exact Satisfiability. As a concrete application, we provide the worst-case complexity for the Hamiltonian cycle problem obtained after mapping it to a suitable Occupation problem. Specifically, we show that the time complexity for the proposed quantum algorithm is bounded by O({2}n/4) for 3-regular undirected graphs, where n is the number of nodes. The same worst-case complexity holds for (3,3)-regular bipartite graphs. As a reference, the current best classical algorithm has a (worst-case) running time bounded by O({2}31n/96). Finally, when compared to heuristic techniques for Exact Satisfiability problems, the proposed quantum algorithm is faster than the classical WalkSAT and Adiabatic Quantum Optimization for random instances with a density of constraints close to the satisfiability threshold, the regime in which instances are typically the hardest to solve. The proposed quantum algorithm can be straightforwardly extended to the generalized version of the Exact Satisfiability known as Occupation problem. The general version of the algorithm is presented and analyzed.
A Subspace Pursuit–based Iterative Greedy Hierarchical Solution to the Neuromagnetic Inverse Problem

PubMed Central

Babadi, Behtash; Obregon-Henao, Gabriel; Lamus, Camilo; Hämäläinen, Matti S.; Brown, Emery N.; Purdon, Patrick L.

2013-01-01

Magnetoencephalography (MEG) is an important non-invasive method for studying activity within the human brain. Source localization methods can be used to estimate spatiotemporal activity from MEG measurements with high temporal resolution, but the spatial resolution of these estimates is poor due to the ill-posed nature of the MEG inverse problem. Recent developments in source localization methodology have emphasized temporal as well as spatial constraints to improve source localization accuracy, but these methods can be computationally intense. Solutions emphasizing spatial sparsity hold tremendous promise, since the underlying neurophysiological processes generating MEG signals are often sparse in nature, whether in the form of focal sources, or distributed sources representing large-scale functional networks. Recent developments in the theory of compressed sensing (CS) provide a rigorous framework to estimate signals with sparse structure. In particular, a class of CS algorithms referred to as greedy pursuit algorithms can provide both high recovery accuracy and low computational complexity. Greedy pursuit algorithms are difficult to apply directly to the MEG inverse problem because of the high-dimensional structure of the MEG source space and the high spatial correlation in MEG measurements. In this paper, we develop a novel greedy pursuit algorithm for sparse MEG source localization that overcomes these fundamental problems. This algorithm, which we refer to as the Subspace Pursuit-based Iterative Greedy Hierarchical (SPIGH) inverse solution, exhibits very low computational complexity while achieving very high localization accuracy. We evaluate the performance of the proposed algorithm using comprehensive simulations, as well as the analysis of human MEG data during spontaneous brain activity and somatosensory stimuli. These studies reveal substantial performance gains provided by the SPIGH algorithm in terms of computational complexity, localization accuracy, and robustness. PMID:24055554

Decomposition of Near-Infrared Spectroscopy Signals Using Oblique Subspace Projections: Applications in Brain Hemodynamic Monitoring.

PubMed

Caicedo, Alexander; Varon, Carolina; Hunyadi, Borbala; Papademetriou, Maria; Tachtsidis, Ilias; Van Huffel, Sabine

2016-01-01

Clinical data is comprised by a large number of synchronously collected biomedical signals that are measured at different locations. Deciphering the interrelationships of these signals can yield important information about their dependence providing some useful clinical diagnostic data. For instance, by computing the coupling between Near-Infrared Spectroscopy signals (NIRS) and systemic variables the status of the hemodynamic regulation mechanisms can be assessed. In this paper we introduce an algorithm for the decomposition of NIRS signals into additive components. The algorithm, SIgnal DEcomposition base on Obliques Subspace Projections (SIDE-ObSP), assumes that the measured NIRS signal is a linear combination of the systemic measurements, following the linear regression model y = Ax + ϵ . SIDE-ObSP decomposes the output such that, each component in the decomposition represents the sole linear influence of one corresponding regressor variable. This decomposition scheme aims at providing a better understanding of the relation between NIRS and systemic variables, and to provide a framework for the clinical interpretation of regression algorithms, thereby, facilitating their introduction into clinical practice. SIDE-ObSP combines oblique subspace projections (ObSP) with the structure of a mean average system in order to define adequate signal subspaces. To guarantee smoothness in the estimated regression parameters, as observed in normal physiological processes, we impose a Tikhonov regularization using a matrix differential operator. We evaluate the performance of SIDE-ObSP by using a synthetic dataset, and present two case studies in the field of cerebral hemodynamics monitoring using NIRS. In addition, we compare the performance of this method with other system identification techniques. In the first case study data from 20 neonates during the first 3 days of life was used, here SIDE-ObSP decoupled the influence of changes in arterial oxygen saturation from the NIRS measurements, facilitating the use of NIRS as a surrogate measure for cerebral blood flow (CBF). The second case study used data from a 3-years old infant under Extra Corporeal Membrane Oxygenation (ECMO), here SIDE-ObSP decomposed cerebral/peripheral tissue oxygenation, as a sum of the partial contributions from different systemic variables, facilitating the comparison between the effects of each systemic variable on the cerebral/peripheral hemodynamics.
A Neural-Network Clustering-Based Algorithm for Privacy Preserving Data Mining

NASA Astrophysics Data System (ADS)

Tsiafoulis, S.; Zorkadis, V. C.; Karras, D. A.

The increasing use of fast and efficient data mining algorithms in huge collections of personal data, facilitated through the exponential growth of technology, in particular in the field of electronic data storage media and processing power, has raised serious ethical, philosophical and legal issues related to privacy protection. To cope with these concerns, several privacy preserving methodologies have been proposed, classified in two categories, methodologies that aim at protecting the sensitive data and those that aim at protecting the mining results. In our work, we focus on sensitive data protection and compare existing techniques according to their anonymity degree achieved, the information loss suffered and their performance characteristics. The ℓ-diversity principle is combined with k-anonymity concepts, so that background information can not be exploited to successfully attack the privacy of data subjects data refer to. Based on Kohonen Self Organizing Feature Maps (SOMs), we firstly organize data sets in subspaces according to their information theoretical distance to each other, then create the most relevant classes paying special attention to rare sensitive attribute values, and finally generalize attribute values to the minimum extend required so that both the data disclosure probability and the information loss are possibly kept negligible. Furthermore, we propose information theoretical measures for assessing the anonymity degree achieved and empirical tests to demonstrate it.
Cluster analysis of multiple planetary flow regimes

NASA Technical Reports Server (NTRS)

Mo, Kingtse; Ghil, Michael

1988-01-01

A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.
Generalizing the self-healing diffusion Monte Carlo approach to finite temperature: A path for the optimization of low-energy many-body bases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reboredo, Fernando A.; Kim, Jeongnim

A statistical method is derived for the calculation of thermodynamic properties of many-body systems at low temperatures. This method is based on the self-healing diffusion Monte Carlo method for complex functions [F. A. Reboredo, J. Chem. Phys. 136, 204101 (2012)] and some ideas of the correlation function Monte Carlo approach [D. M. Ceperley and B. Bernu, J. Chem. Phys. 89, 6316 (1988)]. In order to allow the evolution in imaginary time to describe the density matrix, we remove the fixed-node restriction using complex antisymmetric guiding wave functions. In the process we obtain a parallel algorithm that optimizes a small subspacemore » of the many-body Hilbert space to provide maximum overlap with the subspace spanned by the lowest-energy eigenstates of a many-body Hamiltonian. We show in a model system that the partition function is progressively maximized within this subspace. We show that the subspace spanned by the small basis systematically converges towards the subspace spanned by the lowest energy eigenstates. Possible applications of this method for calculating the thermodynamic properties of many-body systems near the ground state are discussed. The resulting basis can also be used to accelerate the calculation of the ground or excited states with quantum Monte Carlo.« less
Extraction of process zones and low-dimensional attractive subspaces in stochastic fracture mechanics

PubMed Central

Kerfriden, P.; Schmidt, K.M.; Rabczuk, T.; Bordas, S.P.A.

2013-01-01

We propose to identify process zones in heterogeneous materials by tailored statistical tools. The process zone is redefined as the part of the structure where the random process cannot be correctly approximated in a low-dimensional deterministic space. Such a low-dimensional space is obtained by a spectral analysis performed on pre-computed solution samples. A greedy algorithm is proposed to identify both process zone and low-dimensional representative subspace for the solution in the complementary region. In addition to the novelty of the tools proposed in this paper for the analysis of localised phenomena, we show that the reduced space generated by the method is a valid basis for the construction of a reduced order model. PMID:27069423
SIMULTANEOUS MULTISLICE MAGNETIC RESONANCE FINGERPRINTING WITH LOW-RANK AND SUBSPACE MODELING

PubMed Central

Zhao, Bo; Bilgic, Berkin; Adalsteinsson, Elfar; Griswold, Mark A.; Wald, Lawrence L.; Setsompop, Kawin

2018-01-01

Magnetic resonance fingerprinting (MRF) is a new quantitative imaging paradigm that enables simultaneous acquisition of multiple magnetic resonance tissue parameters (e.g., T1, T2, and spin density). Recently, MRF has been integrated with simultaneous multislice (SMS) acquisitions to enable volumetric imaging with faster scan time. In this paper, we present a new image reconstruction method based on low-rank and subspace modeling for improved SMS-MRF. Here the low-rank model exploits strong spatiotemporal correlation among contrast-weighted images, while the subspace model captures the temporal evolution of magnetization dynamics. With the proposed model, the image reconstruction problem is formulated as a convex optimization problem, for which we develop an algorithm based on variable splitting and the alternating direction method of multipliers. The performance of the proposed method has been evaluated by numerical experiments, and the results demonstrate that the proposed method leads to improved accuracy over the conventional approach. Practically, the proposed method has a potential to allow for a 3x speedup with minimal reconstruction error, resulting in less than 5 sec imaging time per slice. PMID:29060594
Stochastic subspace identification for operational modal analysis of an arch bridge

NASA Astrophysics Data System (ADS)

Loh, Chin-Hsiung; Chen, Ming-Che; Chao, Shu-Hsien

2012-04-01

In this paer the application of output-only system identification technique, known as Stochastic Subspace Identification (SSI) algorithms, for civil infrastructures is carried out. The ability of covariance driven stochastic subspace identification (SSI-COV) was proved through the analysis of the ambient data of an arch bridge under operational condition. A newly developed signal processing technique, Singular Spectrum analysis (SSA), capable to smooth noisy signals, is adopted for pre-processing the recorded data before the SSI. The conjunction of SSA and SSICOV provides a useful criterion for the system order determination. With the aim of estimating accurate modal parameters of the structure in off-line analysis, a stabilization diagram is constructed by plotting the identified poles of the system with increasing the size of data Hankel matrix. Identification task of a real structure, Guandu Bridge, is carried out to identify the system natural frequencies and mode shapes. The uncertainty of the identified model parameters from output-only measurement of the bridge under operation condition, such as temperature and traffic loading conditions, is discussed.
CLAss-Specific Subspace Kernel Representations and Adaptive Margin Slack Minimization for Large Scale Classification.

PubMed

Yu, Yinan; Diamantaras, Konstantinos I; McKelvey, Tomas; Kung, Sun-Yuan

2018-02-01

In kernel-based classification models, given limited computational power and storage capacity, operations over the full kernel matrix becomes prohibitive. In this paper, we propose a new supervised learning framework using kernel models for sequential data processing. The framework is based on two components that both aim at enhancing the classification capability with a subset selection scheme. The first part is a subspace projection technique in the reproducing kernel Hilbert space using a CLAss-specific Subspace Kernel representation for kernel approximation. In the second part, we propose a novel structural risk minimization algorithm called the adaptive margin slack minimization to iteratively improve the classification accuracy by an adaptive data selection. We motivate each part separately, and then integrate them into learning frameworks for large scale data. We propose two such frameworks: the memory efficient sequential processing for sequential data processing and the parallelized sequential processing for distributed computing with sequential data acquisition. We test our methods on several benchmark data sets and compared with the state-of-the-art techniques to verify the validity of the proposed techniques.
Development of Subspace-based Hybrid Monte Carlo-Deterministric Algorithms for Reactor Physics Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdel-Khalik, Hany S.; Zhang, Qiong

2014-05-20

The development of hybrid Monte-Carlo-Deterministic (MC-DT) approaches, taking place over the past few decades, have primarily focused on shielding and detection applications where the analysis requires a small number of responses, i.e. at the detector locations(s). This work further develops a recently introduced global variance reduction approach, denoted by the SUBSPACE approach is designed to allow the use of MC simulation, currently limited to benchmarking calculations, for routine engineering calculations. By way of demonstration, the SUBSPACE approach is applied to assembly level calculations used to generate the few-group homogenized cross-sections. These models are typically expensive and need to be executedmore » in the order of 10 3 - 10 5 times to properly characterize the few-group cross-sections for downstream core-wide calculations. Applicability to k-eigenvalue core-wide models is also demonstrated in this work. Given the favorable results obtained in this work, we believe the applicability of the MC method for reactor analysis calculations could be realized in the near future.« less
Simultaneous multislice magnetic resonance fingerprinting with low-rank and subspace modeling.

PubMed

Bo Zhao; Bilgic, Berkin; Adalsteinsson, Elfar; Griswold, Mark A; Wald, Lawrence L; Setsompop, Kawin

2017-07-01

Magnetic resonance fingerprinting (MRF) is a new quantitative imaging paradigm that enables simultaneous acquisition of multiple magnetic resonance tissue parameters (e.g., T 1 , T 2 , and spin density). Recently, MRF has been integrated with simultaneous multislice (SMS) acquisitions to enable volumetric imaging with faster scan time. In this paper, we present a new image reconstruction method based on low-rank and subspace modeling for improved SMS-MRF. Here the low-rank model exploits strong spatiotemporal correlation among contrast-weighted images, while the subspace model captures the temporal evolution of magnetization dynamics. With the proposed model, the image reconstruction problem is formulated as a convex optimization problem, for which we develop an algorithm based on variable splitting and the alternating direction method of multipliers. The performance of the proposed method has been evaluated by numerical experiments, and the results demonstrate that the proposed method leads to improved accuracy over the conventional approach. Practically, the proposed method has a potential to allow for a 3× speedup with minimal reconstruction error, resulting in less than 5 sec imaging time per slice.
Conditional Subspace Clustering of Skill Mastery: Identifying Skills that Separate Students

ERIC Educational Resources Information Center

Nugent, Rebecca; Ayers, Elizabeth; Dean, Nema

2009-01-01

In educational research, a fundamental goal is identifying which skills students have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. We adopt a faster, simpler approach:…
Implementation details of the coupled QMR algorithm

NASA Technical Reports Server (NTRS)

Freund, Roland W.; Nachtigal, Noel M.

1992-01-01

The original quasi-minimal residual method (QMR) relies on the three-term look-ahead Lanczos process, to generate basis vectors for the underlying Krylov subspaces. However, empirical observations indicate that, in finite precision arithmetic, three-term vector recurrences are less robust than mathematically equivalent coupled two-term recurrences. Therefore, we recently proposed a new implementation of the QMR method based on a coupled two-term look-ahead Lanczos procedure. In this paper, we describe implementation details of this coupled QMR algorithm, and we present results of numerical experiments.
A combined joint diagonalization-MUSIC algorithm for subsurface targets localization

NASA Astrophysics Data System (ADS)

Wang, Yinlin; Sigman, John B.; Barrowes, Benjamin E.; O'Neill, Kevin; Shubitidze, Fridon

2014-06-01

This paper presents a combined joint diagonalization (JD) and multiple signal classification (MUSIC) algorithm for estimating subsurface objects locations from electromagnetic induction (EMI) sensor data, without solving ill-posed inverse-scattering problems. JD is a numerical technique that finds the common eigenvectors that diagonalize a set of multistatic response (MSR) matrices measured by a time-domain EMI sensor. Eigenvalues from targets of interest (TOI) can be then distinguished automatically from noise-related eigenvalues. Filtering is also carried out in JD to improve the signal-to-noise ratio (SNR) of the data. The MUSIC algorithm utilizes the orthogonality between the signal and noise subspaces in the MSR matrix, which can be separated with information provided by JD. An array of theoreticallycalculated Green's functions are then projected onto the noise subspace, and the location of the target is estimated by the minimum of the projection owing to the orthogonality. This combined method is applied to data from the Time-Domain Electromagnetic Multisensor Towed Array Detection System (TEMTADS). Examples of TEMTADS test stand data and field data collected at Spencer Range, Tennessee are analyzed and presented. Results indicate that due to its noniterative mechanism, the method can be executed fast enough to provide real-time estimation of objects' locations in the field.
Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Weixuan; Lin, Guang; Li, Bing

2016-09-01

A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate ofmore » $$O(n^{-1/2})$$, the corresponding IRUQ converges at $$O(n^{-1})$$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.« less
Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning

PubMed Central

Gönen, Mehmet

2014-01-01

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks. PMID:24532862
Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning.

PubMed

Gönen, Mehmet

2014-03-01

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.
Removal of EOG Artifacts from EEG Recordings Using Stationary Subspace Analysis

PubMed Central

Zeng, Hong; Song, Aiguo

2014-01-01

An effective approach is proposed in this paper to remove ocular artifacts from the raw EEG recording. The proposed approach first conducts the blind source separation on the raw EEG recording by the stationary subspace analysis (SSA) algorithm. Unlike the classic blind source separation algorithms, SSA is explicitly tailored to the understanding of distribution changes, where both the mean and the covariance matrix are taken into account. In addition, neither independency nor uncorrelation is required among the sources by SSA. Thereby, it can concentrate artifacts in fewer components than the representative blind source separation methods. Next, the components that are determined to be related to the ocular artifacts are projected back to be subtracted from EEG signals, producing the clean EEG data eventually. The experimental results on both the artificially contaminated EEG data and real EEG data have demonstrated the effectiveness of the proposed method, in particular for the cases where limited number of electrodes are used for the recording, as well as when the artifact contaminated signal is highly nonstationary and the underlying sources cannot be assumed to be independent or uncorrelated. PMID:24550696
An Efficient Distributed Compressed Sensing Algorithm for Decentralized Sensor Network.

PubMed

Liu, Jing; Huang, Kaiyu; Zhang, Guoxian

2017-04-20

We consider the joint sparsity Model 1 (JSM-1) in a decentralized scenario, where a number of sensors are connected through a network and there is no fusion center. A novel algorithm, named distributed compact sensing matrix pursuit (DCSMP), is proposed to exploit the computational and communication capabilities of the sensor nodes. In contrast to the conventional distributed compressed sensing algorithms adopting a random sensing matrix, the proposed algorithm focuses on the deterministic sensing matrices built directly on the real acquisition systems. The proposed DCSMP algorithm can be divided into two independent parts, the common and innovation support set estimation processes. The goal of the common support set estimation process is to obtain an estimated common support set by fusing the candidate support set information from an individual node and its neighboring nodes. In the following innovation support set estimation process, the measurement vector is projected into a subspace that is perpendicular to the subspace spanned by the columns indexed by the estimated common support set, to remove the impact of the estimated common support set. We can then search the innovation support set using an orthogonal matching pursuit (OMP) algorithm based on the projected measurement vector and projected sensing matrix. In the proposed DCSMP algorithm, the process of estimating the common component/support set is decoupled with that of estimating the innovation component/support set. Thus, the inaccurately estimated common support set will have no impact on estimating the innovation support set. It is proven that under the condition the estimated common support set contains the true common support set, the proposed algorithm can find the true innovation set correctly. Moreover, since the innovation support set estimation process is independent of the common support set estimation process, there is no requirement for the cardinality of both sets; thus, the proposed DCSMP algorithm is capable of tackling the unknown sparsity problem successfully.
Beating the curse of dimension with accurate statistics for the Fokker-Planck equation in complex turbulent systems.

PubMed

Chen, Nan; Majda, Andrew J

2017-12-05

Solving the Fokker-Planck equation for high-dimensional complex dynamical systems is an important issue. Recently, the authors developed efficient statistically accurate algorithms for solving the Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures, which contain many strong non-Gaussian features such as intermittency and fat-tailed probability density functions (PDFs). The algorithms involve a hybrid strategy with a small number of samples [Formula: see text], where a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious Gaussian kernel density estimation in the remaining low-dimensional subspace. In this article, two effective strategies are developed and incorporated into these algorithms. The first strategy involves a judicious block decomposition of the conditional covariance matrix such that the evolutions of different blocks have no interactions, which allows an extremely efficient parallel computation due to the small size of each individual block. The second strategy exploits statistical symmetry for a further reduction of [Formula: see text] The resulting algorithms can efficiently solve the Fokker-Planck equation with strongly non-Gaussian PDFs in much higher dimensions even with orders in the millions and thus beat the curse of dimension. The algorithms are applied to a [Formula: see text]-dimensional stochastic coupled FitzHugh-Nagumo model for excitable media. An accurate recovery of both the transient and equilibrium non-Gaussian PDFs requires only [Formula: see text] samples! In addition, the block decomposition facilitates the algorithms to efficiently capture the distinct non-Gaussian features at different locations in a [Formula: see text]-dimensional two-layer inhomogeneous Lorenz 96 model, using only [Formula: see text] samples. Copyright © 2017 the Author(s). Published by PNAS.
ANNIT - An Efficient Inversion Algorithm based on Prediction Principles

NASA Astrophysics Data System (ADS)

Růžek, B.; Kolář, P.

2009-04-01

Solution of inverse problems represents meaningful job in geophysics. The amount of data is continuously increasing, methods of modeling are being improved and the computer facilities are also advancing great technical progress. Therefore the development of new and efficient algorithms and computer codes for both forward and inverse modeling is still up to date. ANNIT is contributing to this stream since it is a tool for efficient solution of a set of non-linear equations. Typical geophysical problems are based on parametric approach. The system is characterized by a vector of parameters p, the response of the system is characterized by a vector of data d. The forward problem is usually represented by unique mapping F(p)=d. The inverse problem is much more complex and the inverse mapping p=G(d) is available in an analytical or closed form only exceptionally and generally it may not exist at all. Technically, both forward and inverse mapping F and G are sets of non-linear equations. ANNIT solves such situation as follows: (i) joint subspaces {pD, pM} of original data and model spaces D, M, resp. are searched for, within which the forward mapping F is sufficiently smooth that the inverse mapping G does exist, (ii) numerical approximation of G in subspaces {pD, pM} is found, (iii) candidate solution is predicted by using this numerical approximation. ANNIT is working in an iterative way in cycles. The subspaces {pD, pM} are searched for by generating suitable populations of individuals (models) covering data and model spaces. The approximation of the inverse mapping is made by using three methods: (a) linear regression, (b) Radial Basis Function Network technique, (c) linear prediction (also known as "Kriging"). The ANNIT algorithm has built in also an archive of already evaluated models. Archive models are re-used in a suitable way and thus the number of forward evaluations is minimized. ANNIT is now implemented both in MATLAB and SCILAB. Numerical tests show good performance of the algorithm. Both versions and documentation are available on Internet and anybody can download them. The goal of this presentation is to offer the algorithm and computer codes for anybody interested in the solution to inverse problems.

An efficient shooting algorithm for Evans function calculations in large systems

NASA Astrophysics Data System (ADS)

Humpherys, Jeffrey; Zumbrun, Kevin

2006-08-01

In Evans function computations of the spectra of asymptotically constant-coefficient linear operators, a basic issue is the efficient and numerically stable computation of subspaces evolving according to the associated eigenvalue ODE. For small systems, a fast, shooting algorithm may be obtained by representing subspaces as single exterior products [J.C. Alexander, R. Sachs, Linear instability of solitary waves of a Boussinesq-type equation: A computer assisted computation, Nonlinear World 2 (4) (1995) 471-507; L.Q. Brin, Numerical testing of the stability of viscous shock waves, Ph.D. Thesis, Indiana University, Bloomington, 1998; L.Q. Brin, Numerical testing of the stability of viscous shock waves, Math. Comp. 70 (235) (2001) 1071-1088; L.Q. Brin, K. Zumbrun, Analytically varying eigenvectors and the stability of viscous shock waves, in: Seventh Workshop on Partial Differential Equations, Part I, 2001, Rio de Janeiro, Mat. Contemp. 22 (2002) 19-32; T.J. Bridges, G. Derks, G. Gottwald, Stability and instability of solitary waves of the fifth-order KdV equation: A numerical framework, Physica D 172 (1-4) (2002) 190-216]. For large systems, however, the dimension of the exterior-product space quickly becomes prohibitive, growing as (n/k), where n is the dimension of the system written as a first-order ODE and k (typically ˜n/2) is the dimension of the subspace. We resolve this difficulty by the introduction of a simple polar coordinate algorithm representing “pure” (monomial) products as scalar multiples of orthonormal bases, for which the angular equation is a numerically optimized version of the continuous orthogonalization method of Drury-Davey [A. Davey, An automatic orthonormalization method for solving stiff boundary value problems, J. Comput. Phys. 51 (2) (1983) 343-356; L.O. Drury, Numerical solution of Orr-Sommerfeld-type equations, J. Comput. Phys. 37 (1) (1980) 133-139] and the radial equation is evaluable by quadrature. Notably, the polar-coordinate method preserves the important property of analyticity with respect to parameters.
Linear Subspace Ranking Hashing for Cross-Modal Retrieval.

PubMed

Li, Kai; Qi, Guo-Jun; Ye, Jun; Hua, Kien A

2017-09-01

Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features' ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.
Application of Subspace Detection to the 6 November 2011 M5.6 Prague, Oklahoma Aftershock Sequence

NASA Astrophysics Data System (ADS)

McMahon, N. D.; Benz, H.; Johnson, C. E.; Aster, R. C.; McNamara, D. E.

2015-12-01

Subspace detection is a powerful tool for the identification of small seismic events. Subspace detectors improve upon single-event matched filtering techniques by using multiple orthogonal waveform templates whose linear combinations characterize a range of observed signals from previously identified earthquakes. Subspace detectors running on multiple stations can significantly increasing the number of locatable events, lowering the catalog's magnitude of completeness and thus providing extraordinary detail on the kinematics of the aftershock process. The 6 November 2011 M5.6 earthquake near Prague, Oklahoma is the largest earthquake instrumentally recorded in Oklahoma history and the largest earthquake resultant from deep wastewater injection. A M4.8 foreshock on 5 November 2011 and the M5.6 mainshock triggered tens of thousands of detectable aftershocks along a 20 km splay of the Wilzetta Fault Zone known as the Meeker-Prague fault. In response to this unprecedented earthquake, 21 temporary seismic stations were deployed surrounding the seismic activity. We utilized a catalog of 767 previously located aftershocks to construct subspace detectors for the 21 temporary and 10 closest permanent seismic stations. Subspace detection identified more than 500,000 new arrival-time observations, which associated into more than 20,000 locatable earthquakes. The associated earthquakes were relocated using the Bayesloc multiple-event locator, resulting in ~7,000 earthquakes with hypocentral uncertainties of less than 500 m. The relocated seismicity provides unique insight into the spatio-temporal evolution of the aftershock sequence along the Wilzetta Fault Zone and its associated structures. We find that the crystalline basement and overlying sedimentary Arbuckle formation accommodate the majority of aftershocks. While we observe aftershocks along the entire 20 km length of the Meeker-Prague fault, the vast majority of earthquakes were confined to a 9 km wide by 9 km deep surface striking N54°E and dipping 83° to the northwest near the junction of the splay with the main Wilzetta fault structure. Relocated seismicity shows off-fault stress-related interaction to distances of 10 km or more from the mainshock, including clustered seismicity to the northwest and southeast of the mainshock.
Superresolution Modeling of Calcium Release in the Heart

PubMed Central

Walker, Mark A.; Williams, George S.B.; Kohl, Tobias; Lehnart, Stephan E.; Jafri, M. Saleet; Greenstein, Joseph L.; Lederer, W.J.; Winslow, Raimond L.

2014-01-01

Stable calcium-induced calcium release (CICR) is critical for maintaining normal cellular contraction during cardiac excitation-contraction coupling. The fundamental element of CICR in the heart is the calcium (Ca2+) spark, which arises from a cluster of ryanodine receptors (RyR). Opening of these RyR clusters is triggered to produce a local, regenerative release of Ca2+ from the sarcoplasmic reticulum (SR). The Ca2+ leak out of the SR is an important process for cellular Ca2+ management, and it is critically influenced by spark fidelity, i.e., the probability that a spontaneous RyR opening triggers a Ca2+ spark. Here, we present a detailed, three-dimensional model of a cardiac Ca2+ release unit that incorporates diffusion, intracellular buffering systems, and stochastically gated ion channels. The model exhibits realistic Ca2+ sparks and robust Ca2+ spark termination across a wide range of geometries and conditions. Furthermore, the model captures the details of Ca2+ spark and nonspark-based SR Ca2+ leak, and it produces normal excitation-contraction coupling gain. We show that SR luminal Ca2+-dependent regulation of the RyR is not critical for spark termination, but it can explain the exponential rise in the SR Ca2+ leak-load relationship demonstrated in previous experimental work. Perturbations to subspace dimensions, which have been observed in experimental models of disease, strongly alter Ca2+ spark dynamics. In addition, we find that the structure of RyR clusters also influences Ca2+ release properties due to variations in inter-RyR coupling via local subspace Ca2+ concentration ([Ca2+]ss). These results are illustrated for RyR clusters based on super-resolution stimulated emission depletion microscopy. Finally, we present a believed-novel approach by which the spark fidelity of a RyR cluster can be predicted from structural information of the cluster using the maximum eigenvalue of its adjacency matrix. These results provide critical insights into CICR dynamics in heart, under normal and pathological conditions. PMID:25517166
Subjective comparison and evaluation of speech enhancement algorithms

PubMed Central

Hu, Yi; Loizou, Philipos C.

2007-01-01

Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests. PMID:18046463
Controller reduction by preserving impulse response energy

NASA Technical Reports Server (NTRS)

Craig, Roy R., Jr.; Su, Tzu-Jeng

1989-01-01

A model order reduction algorithm based on a Krylov recurrence formulation is developed to reduce order of controllers. The reduced-order controller is obtained by projecting the full-order LQG controller onto a Krylov subspace in which either the controllability or the observability grammian is equal to the identity matrix. The reduced-order controller preserves the impulse response energy of the full-order controller and has a parameter-matching property. Two numerical examples drawn from other controller reduction literature are used to illustrate the efficacy of the proposed reduction algorithm.
Novel angle estimation for bistatic MIMO radar using an improved MUSIC

NASA Astrophysics Data System (ADS)

Li, Jianfeng; Zhang, Xiaofei; Chen, Han

2014-09-01

In this article, we study the problem of angle estimation for bistatic multiple-input multiple-output (MIMO) radar and propose an improved multiple signal classification (MUSIC) algorithm for joint direction of departure (DOD) and direction of arrival (DOA) estimation. The proposed algorithm obtains initial estimations of angles obtained from the signal subspace and uses the local one-dimensional peak searches to achieve the joint estimations of DOD and DOA. The angle estimation performance of the proposed algorithm is better than that of estimation of signal parameters via rotational invariance techniques (ESPRIT) algorithm, and is almost the same as that of two-dimensional MUSIC. Furthermore, the proposed algorithm can be suitable for irregular array geometry, obtain automatically paired DOD and DOA estimations, and avoid two-dimensional peak searching. The simulation results verify the effectiveness and improvement of the algorithm.
Solving large-scale dynamic systems using band Lanczos method in Rockwell NASTRAN on CRAY X-MP

NASA Technical Reports Server (NTRS)

Gupta, V. K.; Zillmer, S. D.; Allison, R. E.

1986-01-01

The improved cost effectiveness using better models, more accurate and faster algorithms and large scale computing offers more representative dynamic analyses. The band Lanczos eigen-solution method was implemented in Rockwell's version of 1984 COSMIC-released NASTRAN finite element structural analysis computer program to effectively solve for structural vibration modes including those of large complex systems exceeding 10,000 degrees of freedom. The Lanczos vectors were re-orthogonalized locally using the Lanczos Method and globally using the modified Gram-Schmidt method for sweeping rigid-body modes and previously generated modes and Lanczos vectors. The truncated band matrix was solved for vibration frequencies and mode shapes using Givens rotations. Numerical examples are included to demonstrate the cost effectiveness and accuracy of the method as implemented in ROCKWELL NASTRAN. The CRAY version is based on RPK's COSMIC/NASTRAN. The band Lanczos method was more reliable and accurate and converged faster than the single vector Lanczos Method. The band Lanczos method was comparable to the subspace iteration method which was a block version of the inverse power method. However, the subspace matrix tended to be fully populated in the case of subspace iteration and not as sparse as a band matrix.
ERP denoising in multichannel EEG data using contrasts between signal and noise subspaces.

PubMed

Ivannikov, Andriy; Kalyakin, Igor; Hämäläinen, Jarmo; Leppänen, Paavo H T; Ristaniemi, Tapani; Lyytinen, Heikki; Kärkkäinen, Tommi

2009-06-15

In this paper, a new method intended for ERP denoising in multichannel EEG data is discussed. The denoising is done by separating ERP/noise subspaces in multidimensional EEG data by a linear transformation and the following dimension reduction by ignoring noise components during inverse transformation. The separation matrix is found based on the assumption that ERP sources are deterministic for all repetitions of the same type of stimulus within the experiment, while the other noise sources do not obey the determinancy property. A detailed derivation of the technique is given together with the analysis of the results of its application to a real high-density EEG data set. The interpretation of the results and the performance of the proposed method under conditions, when the basic assumptions are violated - e.g. the problem is underdetermined - are also discussed. Moreover, we study how the factors of the number of channels and trials used by the method influence the effectiveness of ERP/noise subspaces separation. In addition, we explore also the impact of different data resampling strategies on the performance of the considered algorithm. The results can help in determining the optimal parameters of the equipment/methods used to elicit and reliably estimate ERPs.
View subspaces for indexing and retrieval of 3D models

NASA Astrophysics Data System (ADS)

Dutagaci, Helin; Godil, Afzal; Sankur, Bülent; Yemez, Yücel

2010-02-01

View-based indexing schemes for 3D object retrieval are gaining popularity since they provide good retrieval results. These schemes are coherent with the theory that humans recognize objects based on their 2D appearances. The viewbased techniques also allow users to search with various queries such as binary images, range images and even 2D sketches. The previous view-based techniques use classical 2D shape descriptors such as Fourier invariants, Zernike moments, Scale Invariant Feature Transform-based local features and 2D Digital Fourier Transform coefficients. These methods describe each object independent of others. In this work, we explore data driven subspace models, such as Principal Component Analysis, Independent Component Analysis and Nonnegative Matrix Factorization to describe the shape information of the views. We treat the depth images obtained from various points of the view sphere as 2D intensity images and train a subspace to extract the inherent structure of the views within a database. We also show the benefit of categorizing shapes according to their eigenvalue spread. Both the shape categorization and data-driven feature set conjectures are tested on the PSB database and compared with the competitor view-based 3D shape retrieval algorithms.
PRIFIRA: General regularization using prior-conditioning for fast radio interferometric imaging†

NASA Astrophysics Data System (ADS)

Naghibzadeh, Shahrzad; van der Veen, Alle-Jan

2018-06-01

Image formation in radio astronomy is a large-scale inverse problem that is inherently ill-posed. We present a general algorithmic framework based on a Bayesian-inspired regularized maximum likelihood formulation of the radio astronomical imaging problem with a focus on diffuse emission recovery from limited noisy correlation data. The algorithm is dubbed PRIor-conditioned Fast Iterative Radio Astronomy (PRIFIRA) and is based on a direct embodiment of the regularization operator into the system by right preconditioning. The resulting system is then solved using an iterative method based on projections onto Krylov subspaces. We motivate the use of a beamformed image (which includes the classical "dirty image") as an efficient prior-conditioner. Iterative reweighting schemes generalize the algorithmic framework and can account for different regularization operators that encourage sparsity of the solution. The performance of the proposed method is evaluated based on simulated one- and two-dimensional array arrangements as well as actual data from the core stations of the Low Frequency Array radio telescope antenna configuration, and compared to state-of-the-art imaging techniques. We show the generality of the proposed method in terms of regularization schemes while maintaining a competitive reconstruction quality with the current reconstruction techniques. Furthermore, we show that exploiting Krylov subspace methods together with the proper noise-based stopping criteria results in a great improvement in imaging efficiency.
Sliding Window Generalized Kernel Affine Projection Algorithm Using Projection Mappings

NASA Astrophysics Data System (ADS)

Slavakis, Konstantinos; Theodoridis, Sergios

2008-12-01

Very recently, a solution to the kernel-based online classification problem has been given by the adaptive projected subgradient method (APSM). The developed algorithm can be considered as a generalization of a kernel affine projection algorithm (APA) and the kernel normalized least mean squares (NLMS). Furthermore, sparsification of the resulting kernel series expansion was achieved by imposing a closed ball (convex set) constraint on the norm of the classifiers. This paper presents another sparsification method for the APSM approach to the online classification task by generating a sequence of linear subspaces in a reproducing kernel Hilbert space (RKHS). To cope with the inherent memory limitations of online systems and to embed tracking capabilities to the design, an upper bound on the dimension of the linear subspaces is imposed. The underlying principle of the design is the notion of projection mappings. Classification is performed by metric projection mappings, sparsification is achieved by orthogonal projections, while the online system's memory requirements and tracking are attained by oblique projections. The resulting sparsification scheme shows strong similarities with the classical sliding window adaptive schemes. The proposed design is validated by the adaptive equalization problem of a nonlinear communication channel, and is compared with classical and recent stochastic gradient descent techniques, as well as with the APSM's solution where sparsification is performed by a closed ball constraint on the norm of the classifiers.
Comparative analysis of different weight matrices in subspace system identification for structural health monitoring

NASA Astrophysics Data System (ADS)

Shokravi, H.; Bakhary, NH

2017-11-01

Subspace System Identification (SSI) is considered as one of the most reliable tools for identification of system parameters. Performance of a SSI scheme is considerably affected by the structure of the associated identification algorithm. Weight matrix is a variable in SSI that is used to reduce the dimensionality of the state-space equation. Generally one of the weight matrices of Principle Component (PC), Unweighted Principle Component (UPC) and Canonical Variate Analysis (CVA) are used in the structure of a SSI algorithm. An increasing number of studies in the field of structural health monitoring are using SSI for damage identification. However, studies that evaluate the performance of the weight matrices particularly in association with accuracy, noise resistance, and time complexity properties are very limited. In this study, the accuracy, noise-robustness, and time-efficiency of the weight matrices are compared using different qualitative and quantitative metrics. Three evaluation metrics of pole analysis, fit values and elapsed time are used in the assessment process. A numerical model of a mass-spring-dashpot and operational data is used in this research paper. It is observed that the principal components obtained using PC algorithms are more robust against noise uncertainty and give more stable results for the pole distribution. Furthermore, higher estimation accuracy is achieved using UPC algorithm. CVA had the worst performance for pole analysis and time efficiency analysis. The superior performance of the UPC algorithm in the elapsed time is attributed to using unit weight matrices. The obtained results demonstrated that the process of reducing dimensionality in CVA and PC has not enhanced the time efficiency but yield an improved modal identification in PC.
Interactive machine learning for health informatics: when do we need the human-in-the-loop?

PubMed

Holzinger, Andreas

2016-06-01

Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as "algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human." This "human-in-the-loop" can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase.
On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

NASA Astrophysics Data System (ADS)

Korda, Milan; Mezić, Igor

2018-04-01

Extended dynamic mode decomposition (EDMD) (Williams et al. in J Nonlinear Sci 25(6):1307-1346, 2015) is an algorithm that approximates the action of the Koopman operator on an N-dimensional subspace of the space of observables by sampling at M points in the state space. Assuming that the samples are drawn either independently or ergodically from some measure μ , it was shown in Klus et al. (J Comput Dyn 3(1):51-79, 2016) that, in the limit as M→ ∞, the EDMD operator K_{N,M} converges to K_N, where K_N is the L_2(μ )-orthogonal projection of the action of the Koopman operator on the finite-dimensional subspace of observables. We show that, as N → ∞, the operator K_N converges in the strong operator topology to the Koopman operator. This in particular implies convergence of the predictions of future values of a given observable over any finite time horizon, a fact important for practical applications such as forecasting, estimation and control. In addition, we show that accumulation points of the spectra of K_N correspond to the eigenvalues of the Koopman operator with the associated eigenfunctions converging weakly to an eigenfunction of the Koopman operator, provided that the weak limit of the eigenfunctions is nonzero. As a by-product, we propose an analytic version of the EDMD algorithm which, under some assumptions, allows one to construct K_N directly, without the use of sampling. Finally, under additional assumptions, we analyze convergence of K_{N,N} (i.e., M=N), proving convergence, along a subsequence, to weak eigenfunctions (or eigendistributions) related to the eigenmeasures of the Perron-Frobenius operator. No assumptions on the observables belonging to a finite-dimensional invariant subspace of the Koopman operator are required throughout.
Accelerating the weighted histogram analysis method by direct inversion in the iterative subspace.

PubMed

Zhang, Cheng; Lai, Chun-Liang; Pettitt, B Montgomery

The weighted histogram analysis method (WHAM) for free energy calculations is a valuable tool to produce free energy differences with the minimal errors. Given multiple simulations, WHAM obtains from the distribution overlaps the optimal statistical estimator of the density of states, from which the free energy differences can be computed. The WHAM equations are often solved by an iterative procedure. In this work, we use a well-known linear algebra algorithm which allows for more rapid convergence to the solution. We find that the computational complexity of the iterative solution to WHAM and the closely-related multiple Bennett acceptance ratio (MBAR) method can be improved by using the method of direct inversion in the iterative subspace. We give examples from a lattice model, a simple liquid and an aqueous protein solution.
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Heuristic approach to image registration

NASA Astrophysics Data System (ADS)

Gertner, Izidor; Maslov, Igor V.

2000-08-01

Image registration, i.e. correct mapping of images obtained from different sensor readings onto common reference frame, is a critical part of multi-sensor ATR/AOR systems based on readings from different types of sensors. In order to fuse two different sensor readings of the same object, the readings have to be put into a common coordinate system. This task can be formulated as optimization problem in a space of all possible affine transformations of an image. In this paper, a combination of heuristic methods is explored to register gray- scale images. The modification of Genetic Algorithm is used as the first step in global search for optimal transformation. It covers the entire search space with (randomly or heuristically) scattered probe points and helps significantly reduce the search space to a subspace of potentially most successful transformations. Due to its discrete character, however, Genetic Algorithm in general can not converge while coming close to the optimum. Its termination point can be specified either as some predefined number of generations or as achievement of a certain acceptable convergence level. To refine the search, potential optimal subspaces are searched using more delicate and efficient for local search Taboo and Simulated Annealing methods.
Decentralized system identification using stochastic subspace identification on wireless smart sensor networks

NASA Astrophysics Data System (ADS)

Sim, Sung-Han; Spencer, Billie F., Jr.; Park, Jongwoong; Jung, Hyungjo

2012-04-01

Wireless Smart Sensor Networks (WSSNs) facilitates a new paradigm to structural identification and monitoring for civil infrastructure. Conventional monitoring systems based on wired sensors and centralized data acquisition and processing have been considered to be challenging and costly due to cabling and expensive equipment and maintenance costs. WSSNs have emerged as a technology that can overcome such difficulties, making deployment of a dense array of sensors on large civil structures both feasible and economical. However, as opposed to wired sensor networks in which centralized data acquisition and processing is common practice, WSSNs require decentralized computing algorithms to reduce data transmission due to the limitation associated with wireless communication. Thus, several system identification methods have been implemented to process sensor data and extract essential information, including Natural Excitation Technique with Eigensystem Realization Algorithm, Frequency Domain Decomposition (FDD), and Random Decrement Technique (RDT); however, Stochastic Subspace Identification (SSI) has not been fully utilized in WSSNs, while SSI has the strong potential to enhance the system identification. This study presents a decentralized system identification using SSI in WSSNs. The approach is implemented on MEMSIC's Imote2 sensor platform and experimentally verified using a 5-story shear building model.
Separation and imaging diffractions by a sparsity-promoting model and subspace trust-region algorithm

NASA Astrophysics Data System (ADS)

Yu, Caixia; Zhao, Jingtao; Wang, Yanfei; Wang, Chengxiang; Geng, Weifeng

2017-03-01

The small-scale geologic inhomogeneities or discontinuities, such as tiny faults, cavities or fractures, generally have spatial scales comparable to or even smaller than the seismic wavelength. Therefore, the seismic responses of these objects are coded in diffractions and an attempt to high-resolution imaging can be made if we can appropriately image them. As the amplitudes of reflections can be several orders of magnitude larger than those of diffractions, one of the key problems of diffraction imaging is to suppress reflections and at the same time to preserve diffractions. A sparsity-promoting method for separating diffractions in the common-offset domain is proposed that uses the Kirchhoff integral formula to enforce the sparsity of diffractions and the linear Radon transform to formulate reflections. A subspace trust-region algorithm that can provide globally convergent solutions is employed for solving this large-scale computation problem. The method not only allows for separation of diffractions in the case of interfering events but also ensures a high fidelity of the separated diffractions. Numerical experiment and field application demonstrate the good performance of the proposed method in imaging the small-scale geological features related to the migration channel and storage spaces of carbonate reservoirs.

Development of Xi'an-CI package – applying the hole–particle symmetry in multi-reference electronic correlation calculations

NASA Astrophysics Data System (ADS)

Suo, Bingbing; Lei, Yibo; Han, Huixian; Wang, Yubin

2018-04-01

This mini-review introduces our works on the Xi'an-CI (configuration interaction) package using graphical unitary group approach (GUGA). Taking advantage of the hole-particle symmetry in GUGA, the Galfand states used to span the CI space are classified into CI subspaces according to the number of holes and particles, and the coupling coefficients used to calculate Hamiltonian matrix elements could be factorised into the segment factors in the hole, active and external spaces. An efficient multi-reference CI with single and double excitations (MRCISD) algorithm is thus developed that reduces the storage requirement and increases the number of correlated electrons significantly. The hole-particle symmetry also gives rise to a doubly contracted MRCISD approach. Moreover, the internally contracted Gelfand states are defined within the CI subspace arising from the hole-particle symmetry, which makes the implementation of internally contracted MRCISD in the framework of GUGA possible. In addition to MRCISD, the development of multi-reference second-order perturbation theory (MRPT2) also benefits from the hole-particle symmetry. A configuration-based MRPT2 algorithm is proposed and extended to the multi-state n-electron valence-state second-order perturbation theory.
Spatially Correlated Sparse MIMO Channel Path Delay Estimation in Scattering Environments Based on Signal Subspace Tracking

PubMed Central

Chargé, Pascal; Bazzi, Oussama; Ding, Yuehua

2018-01-01

A parametric scheme for spatially correlated sparse multiple-input multiple-output (MIMO) channel path delay estimation in scattering environments is presented in this paper. In MIMO outdoor communication scenarios, channel impulse responses (CIRs) of different transmit–receive antenna pairs are often supposed to be sparse due to a few significant scatterers, and share a common sparse pattern, such that path delays are assumed to be equal for every transmit–receive antenna pair. In some existing works, an exact common support condition is exploited, where the path delays are considered equal for every transmit–receive antenna pair, meanwhile ignoring the influence of scattering. A more realistic channel model is proposed in this paper, where due to scatterers in the environment, the received signals are modeled as clusters of multi-rays around a nominal or mean time delay at different antenna elements, resulting in a non-strictly exact common support phenomenon. A method for estimating the channel mean path delays is then derived based on the subspace approach, and the tracking of the effective dimension of the signal subspace that changes due to the wireless environment. The proposed method shows an improved channel mean path delays estimation performance in comparison with the conventional estimation methods. PMID:29734797
Spatially Correlated Sparse MIMO Channel Path Delay Estimation in Scattering Environments Based on Signal Subspace Tracking.

PubMed

Mohydeen, Ali; Chargé, Pascal; Wang, Yide; Bazzi, Oussama; Ding, Yuehua

2018-05-06

A parametric scheme for spatially correlated sparse multiple-input multiple-output (MIMO) channel path delay estimation in scattering environments is presented in this paper. In MIMO outdoor communication scenarios, channel impulse responses (CIRs) of different transmit⁻receive antenna pairs are often supposed to be sparse due to a few significant scatterers, and share a common sparse pattern, such that path delays are assumed to be equal for every transmit⁻receive antenna pair. In some existing works, an exact common support condition is exploited, where the path delays are considered equal for every transmit⁻receive antenna pair, meanwhile ignoring the influence of scattering. A more realistic channel model is proposed in this paper, where due to scatterers in the environment, the received signals are modeled as clusters of multi-rays around a nominal or mean time delay at different antenna elements, resulting in a non-strictly exact common support phenomenon. A method for estimating the channel mean path delays is then derived based on the subspace approach, and the tracking of the effective dimension of the signal subspace that changes due to the wireless environment. The proposed method shows an improved channel mean path delays estimation performance in comparison with the conventional estimation methods.
A hybrid approach to generating search subspaces in dynamically constrained 4-dimensional data assimilation

NASA Astrophysics Data System (ADS)

Yaremchuk, Max; Martin, Paul; Beattie, Christopher

2017-09-01

Development and maintenance of the linearized and adjoint code for advanced circulation models is a challenging issue, requiring a significant proportion of total effort in operational data assimilation (DA). The ensemble-based DA techniques provide a derivative-free alternative, which appears to be competitive with variational methods in many practical applications. This article proposes a hybrid scheme for generating the search subspaces in the adjoint-free 4-dimensional DA method (a4dVar) that does not use a predefined ensemble. The method resembles 4dVar in that the optimal solution is strongly constrained by model dynamics and search directions are supplied iteratively using information from the current and previous model trajectories generated in the process of optimization. In contrast to 4dVar, which produces a single search direction from exact gradient information, a4dVar employs an ensemble of directions to form a subspace in order to proceed. In the earlier versions of a4dVar, search subspaces were built using the leading EOFs of either the model trajectory or the projections of the model-data misfits onto the range of the background error covariance (BEC) matrix at the current iteration. In the present study, we blend both approaches and explore a hybrid scheme of ensemble generation in order to improve the performance and flexibility of the algorithm. In addition, we introduce balance constraints into the BEC structure and periodically augment the search ensemble with BEC eigenvectors to avoid repeating minimization over already explored subspaces. Performance of the proposed hybrid a4dVar (ha4dVar) method is compared with that of standard 4dVar in a realistic regional configuration assimilating real data into the Navy Coastal Ocean Model (NCOM). It is shown that the ha4dVar converges faster than a4dVar and can be potentially competitive with 4dvar both in terms of the required computational time and the forecast skill.
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Quantum search algorithms on a regular lattice

NASA Astrophysics Data System (ADS)

Hein, Birgit; Tanner, Gregor

2010-07-01

Quantum algorithms for searching for one or more marked items on a d-dimensional lattice provide an extension of Grover’s search algorithm including a spatial component. We demonstrate that these lattice search algorithms can be viewed in terms of the level dynamics near an avoided crossing of a one-parameter family of quantum random walks. We give approximations for both the level splitting at the avoided crossing and the effectively two-dimensional subspace of the full Hilbert space spanning the level crossing. This makes it possible to give the leading order behavior for the search time and the localization probability in the limit of large lattice size including the leading order coefficients. For d=2 and d=3, these coefficients are calculated explicitly. Closed form expressions are given for higher dimensions.
Using parallel evolutionary development for a biologically-inspired computer vision system for mobile robots.

PubMed

Wright, Cameron H G; Barrett, Steven F; Pack, Daniel J

2005-01-01

We describe a new approach to attacking the problem of robust computer vision for mobile robots. The overall strategy is to mimic the biological evolution of animal vision systems. Our basic imaging sensor is based upon the eye of the common house fly, Musca domestica. The computational algorithms are a mix of traditional image processing, subspace techniques, and multilayer neural networks.
Margin-Wide Earthquake Subspace Scanning Along the Cascadia Subduction Zone Using the Cascadia Initiative Amphibious Dataset

NASA Astrophysics Data System (ADS)

Morton, E.; Bilek, S. L.; Rowe, C. A.

2017-12-01

Understanding the spatial extent and behavior of the interplate contact in the Cascadia Subduction Zone (CSZ) may prove pivotal to preparation for future great earthquakes, such as the M9 event of 1700. Current and historic seismic catalogs are limited in their integrity by their short duration, given the recurrence rate of great earthquakes, and by their rather high magnitude of completeness for the interplate seismic zone, due to its offshore distance from these land-based networks. This issue is addressed via the 2011-2015 Cascadia Initiative (CI) amphibious seismic array deployment, which combined coastal land seismometers with more than 60 ocean-bottom seismometers (OBS) situated directly above the presumed plate interface. We search the CI dataset for small, previously undetected interplate earthquakes to identify seismic patches on the megathrust. Using the automated subspace detection method, we search for previously undetected events. Our subspace comprises eigenvectors derived from CI OBS and on-land waveforms extracted for existing catalog events that appear to have occurred on the plate interface. Previous work focused on analysis of two repeating event clusters off the coast of Oregon spanning all 4 years of deployment. Here we expand earlier results to include detection and location analysis to the entire CSZ margin during the first year of CI deployment, with more than 200 new events detected for the central portion of the margin. Template events used for subspace scanning primarily occurred beneath the land surface along the coast, at the downdip edge of modeled high slip patches for the 1700 event, with most concentrated at the northwestern edge of the Olympic Peninsula.
Updating Hawaii Seismicity Catalogs with Systematic Relocations and Subspace Detectors

NASA Astrophysics Data System (ADS)

Okubo, P.; Benz, H.; Matoza, R. S.; Thelen, W. A.

2015-12-01

We continue the systematic relocation of seismicity recorded in Hawai`i by the United States Geological Survey's (USGS) Hawaiian Volcano Observatory (HVO), with interests in adding to the products derived from the relocated seismicity catalogs published by Matoza et al., (2013, 2014). Another goal of this effort is updating the systematically relocated HVO catalog since 2009, when earthquake cataloging at HVO was migrated to the USGS Advanced National Seismic System Quake Management Software (AQMS) systems. To complement the relocation analyses of the catalogs generated from traditional STA/LTA event-triggered and analyst-reviewed approaches, we are also experimenting with subspace detection of events at Kilauea as a means to augment AQMS procedures for cataloging seismicity to lower magnitudes and during episodes of elevated volcanic activity. Our earlier catalog relocations have demonstrated the ability to define correlated or repeating families of earthquakes and provide more detailed definition of seismogenic structures, as well as the capability for improved automatic identification of diverse volcanic seismic sources. Subspace detectors have been successfully applied to cataloging seismicity in situations of low seismic signal-to-noise and have significantly increased catalog sensitivity to lower magnitude thresholds. We anticipate similar improvements using event subspace detections and cataloging of volcanic seismicity that include improved discrimination among not only evolving earthquake sequences but also diverse volcanic seismic source processes. Matoza et al., 2013, Systematic relocation of seismicity on Hawai`i Island from 1992 to 2009 using waveform cross correlation and cluster analysis, J. Geophys. Res., 118, 2275-2288, doi:10.1002/jgrb.580189 Matoza et al., 2014, High-precision relocation of long-period events beneath the summit region of Kīlauea Volcano, Hawai`i, from 1986 to 2009, Geophys. Res. Lett., 41, 3413-3421, doi:10.1002/2014GL059819
Gene selection for microarray data classification via subspace learning and manifold regularization.

PubMed

Tang, Chang; Cao, Lijuan; Zheng, Xiao; Wang, Minhui

2017-12-19

With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.
Structured Kernel Subspace Learning for Autonomous Robot Navigation.

PubMed

Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai

2018-02-14

This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.
A novel complex networks clustering algorithm based on the core influence of nodes.

PubMed

Tong, Chao; Niu, Jianwei; Dai, Bin; Xie, Zhongyu

2014-01-01

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.
Solute transport with multiple equilibrium-controlled or kinetically controlled chemical reactions

USGS Publications Warehouse

Friedly, John C.; Rubin, Jacob

1992-01-01

A new approach is applied to the problem of modeling solute transport accompanied by many chemical reactions. The approach, based on concepts of the concentration space and its stoichiometric subspaces, uses elements of the subspaces as primary dependent variables. It is shown that the resulting model equations are compact in form, isolate the chemical reaction expressions from flow expressions, and can be used for either equilibrium or kinetically controlled reactions. The implications of the results on numerical algorithms for solving the equations are discussed. The application of the theory is illustrated throughout with examples involving a simple but broadly representative set of reactions previously considered in the literature. Numerical results are presented for four interconnected reactions: a homogeneous complexation reaction, two sorption reactions, and a dissolution/precipitation reaction. Three cases are considered: (1) four kinetically controlled reactions, (2) four equilibrium-controlled reactions, and (3) a system with two kinetically controlled reactions and two equilibrium-controlled reactions.
Krylov subspace methods on supercomputers

NASA Technical Reports Server (NTRS)

Saad, Youcef

1988-01-01

A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.
Optimal mode transformations for linear-optical cluster-state generation

DOE PAGES

Uskov, Dmitry B.; Lougovski, Pavel; Alsing, Paul M.; ...

2015-06-15

In this paper, we analyze the generation of linear-optical cluster states (LOCSs) via sequential addition of one and two qubits. Existing approaches employ the stochastic linear-optical two-qubit controlled-Z (CZ) gate with success rate of 1/9 per operation. The question of optimality of the CZ gate with respect to LOCS generation has remained open. We report that there are alternative schemes to the CZ gate that are exponentially more efficient and show that sequential LOCS growth is indeed globally optimal. We find that the optimal cluster growth operation is a state transformation on a subspace of the full Hilbert space. Finally,more » we show that the maximal success rate of postselected entangling n photonic qubits or m Bell pairs into a cluster is (1/2) n-1 and (1/4) m-1, respectively, with no ancilla photons, and we give an explicit optical description of the optimal mode transformations.« less
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Chahine, Firas Safwan

2012-01-01

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Verification Techniques for Parameter Selection and Bayesian Model Calibration Presented for an HIV Model

NASA Astrophysics Data System (ADS)

Wentworth, Mami Tonoe

Uncertainty quantification plays an important role when making predictive estimates of model responses. In this context, uncertainty quantification is defined as quantifying and reducing uncertainties, and the objective is to quantify uncertainties in parameter, model and measurements, and propagate the uncertainties through the model, so that one can make a predictive estimate with quantified uncertainties. Two of the aspects of uncertainty quantification that must be performed prior to propagating uncertainties are model calibration and parameter selection. There are several efficient techniques for these processes; however, the accuracy of these methods are often not verified. This is the motivation for our work, and in this dissertation, we present and illustrate verification frameworks for model calibration and parameter selection in the context of biological and physical models. First, HIV models, developed and improved by [2, 3, 8], describe the viral infection dynamics of an HIV disease. These are also used to make predictive estimates of viral loads and T-cell counts and to construct an optimal control for drug therapy. Estimating input parameters is an essential step prior to uncertainty quantification. However, not all the parameters are identifiable, implying that they cannot be uniquely determined by the observations. These unidentifiable parameters can be partially removed by performing parameter selection, a process in which parameters that have minimal impacts on the model response are determined. We provide verification techniques for Bayesian model calibration and parameter selection for an HIV model. As an example of a physical model, we employ a heat model with experimental measurements presented in [10]. A steady-state heat model represents a prototypical behavior for heat conduction and diffusion process involved in a thermal-hydraulic model, which is a part of nuclear reactor models. We employ this simple heat model to illustrate verification techniques for model calibration. For Bayesian model calibration, we employ adaptive Metropolis algorithms to construct densities for input parameters in the heat model and the HIV model. To quantify the uncertainty in the parameters, we employ two MCMC algorithms: Delayed Rejection Adaptive Metropolis (DRAM) [33] and Differential Evolution Adaptive Metropolis (DREAM) [66, 68]. The densities obtained using these methods are compared to those obtained through the direct numerical evaluation of the Bayes' formula. We also combine uncertainties in input parameters and measurement errors to construct predictive estimates for a model response. A significant emphasis is on the development and illustration of techniques to verify the accuracy of sampling-based Metropolis algorithms. We verify the accuracy of DRAM and DREAM by comparing chains, densities and correlations obtained using DRAM, DREAM and the direct evaluation of Bayes formula. We also perform similar analysis for credible and prediction intervals for responses. Once the parameters are estimated, we employ energy statistics test [63, 64] to compare the densities obtained by different methods for the HIV model. The energy statistics are used to test the equality of distributions. We also consider parameter selection and verification techniques for models having one or more parameters that are noninfluential in the sense that they minimally impact model outputs. We illustrate these techniques for a dynamic HIV model but note that the parameter selection and verification framework is applicable to a wide range of biological and physical models. To accommodate the nonlinear input to output relations, which are typical for such models, we focus on global sensitivity analysis techniques, including those based on partial correlations, Sobol indices based on second-order model representations, and Morris indices, as well as a parameter selection technique based on standard errors. A significant objective is to provide verification strategies to assess the accuracy of those techniques, which we illustrate in the context of the HIV model. Finally, we examine active subspace methods as an alternative to parameter subset selection techniques. The objective of active subspace methods is to determine the subspace of inputs that most strongly affect the model response, and to reduce the dimension of the input space. The major difference between active subspace methods and parameter selection techniques is that parameter selection identifies influential parameters whereas subspace selection identifies a linear combination of parameters that impacts the model responses significantly. We employ active subspace methods discussed in [22] for the HIV model and present a verification that the active subspace successfully reduces the input dimensions.
State-selective optimization of local excited electronic states in extended systems

NASA Astrophysics Data System (ADS)

Kovyrshin, Arseny; Neugebauer, Johannes

2010-11-01

Standard implementations of time-dependent density-functional theory (TDDFT) for the calculation of excitation energies give access to a number of the lowest-lying electronic excitations of a molecule under study. For extended systems, this can become cumbersome if a particular excited state is sought-after because many electronic transitions may be present. This often means that even for systems of moderate size, a multitude of excited states needs to be calculated to cover a certain energy range. Here, we present an algorithm for the selective determination of predefined excited electronic states in an extended system. A guess transition density in terms of orbital transitions has to be provided for the excitation that shall be optimized. The approach employs root-homing techniques together with iterative subspace diagonalization methods to optimize the electronic transition. We illustrate the advantages of this method for solvated molecules, core-excitations of metal complexes, and adsorbates at cluster surfaces. In particular, we study the local π →π∗ excitation of a pyridine molecule adsorbed at a silver cluster. It is shown that the method works very efficiently even for high-lying excited states. We demonstrate that the assumption of a single, well-defined local excitation is, in general, not justified for extended systems, which can lead to root-switching during optimization. In those cases, the method can give important information about the spectral distribution of the orbital transition employed as a guess.
Macromolecule mapping of the brain using ultrashort-TE acquisition and reference-based metabolite removal.

PubMed

Lam, Fan; Li, Yudu; Clifford, Bryan; Liang, Zhi-Pei

2018-05-01

To develop a practical method for mapping macromolecule distribution in the brain using ultrashort-TE MRSI data. An FID-based chemical shift imaging acquisition without metabolite-nulling pulses was used to acquire ultrashort-TE MRSI data that capture the macromolecule signals with high signal-to-noise-ratio (SNR) efficiency. To remove the metabolite signals from the ultrashort-TE data, single voxel spectroscopy data were obtained to determine a set of high-quality metabolite reference spectra. These spectra were then incorporated into a generalized series (GS) model to represent general metabolite spatiospectral distributions. A time-segmented algorithm was developed to back-extrapolate the GS model-based metabolite distribution from truncated FIDs and remove it from the MRSI data. Numerical simulations and in vivo experiments have been performed to evaluate the proposed method. Simulation results demonstrate accurate metabolite signal extrapolation by the proposed method given a high-quality reference. For in vivo experiments, the proposed method is able to produce spatiospectral distributions of macromolecules in the brain with high SNR from data acquired in about 10 minutes. We further demonstrate that the high-dimensional macromolecule spatiospectral distribution resides in a low-dimensional subspace. This finding provides a new opportunity to use subspace models for quantification and accelerated macromolecule mapping. Robustness of the proposed method is also demonstrated using multiple data sets from the same and different subjects. The proposed method is able to obtain macromolecule distributions in the brain from ultrashort-TE acquisitions. It can also be used for acquiring training data to determine a low-dimensional subspace to represent the macromolecule signals for subspace-based MRSI. Magn Reson Med 79:2460-2469, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Semiannual Report for Contract NAS1-19480 (Institute for Computer Applications in Science and Engineering)

DTIC Science & Technology

1994-06-01

algorithms for large, irreducibly coupled systems iteratively solve concurrent problems within different subspaces of a Hilbert space, or within different...effective on problems amenable to SIMD solution. Together with researchers at AT&T Bell Labs (Boris Lubachevsky, Albert Greenberg ) we have developed...reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented. Greenberg , Albert G., Boris D

SOTXTSTREAM: Density-based self-organizing clustering of text streams.

PubMed

Bryant, Avory C; Cios, Krzysztof J

2017-01-01

A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
A Gaussian-based rank approximation for subspace clustering

NASA Astrophysics Data System (ADS)

Xu, Fei; Peng, Chong; Hu, Yunhong; He, Guoping

2018-04-01

Low-rank representation (LRR) has been shown successful in seeking low-rank structures of data relationships in a union of subspaces. Generally, LRR and LRR-based variants need to solve the nuclear norm-based minimization problems. Beyond the success of such methods, it has been widely noted that the nuclear norm may not be a good rank approximation because it simply adds all singular values of a matrix together and thus large singular values may dominant the weight. This results in far from satisfactory rank approximation and may degrade the performance of lowrank models based on the nuclear norm. In this paper, we propose a novel nonconvex rank approximation based on the Gaussian distribution function, which has demanding properties to be a better rank approximation than the nuclear norm. Then a low-rank model is proposed based on the new rank approximation with application to motion segmentation. Experimental results have shown significant improvements and verified the effectiveness of our method.
An Extended Spectral-Spatial Classification Approach for Hyperspectral Data

NASA Astrophysics Data System (ADS)

Akbari, D.

2017-11-01

In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.
The global Minmax k-means algorithm.

PubMed

Wang, Xiaoyan; Bai, Yanping

2016-01-01

The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.
Noise-enhanced clustering and competitive learning algorithms.

PubMed

Osoba, Osonde; Kosko, Bart

2013-01-01

Noise can provably speed up convergence in many centroid-based clustering algorithms. This includes the popular k-means clustering algorithm. The clustering noise benefit follows from the general noise benefit for the expectation-maximization algorithm because many clustering algorithms are special cases of the expectation-maximization algorithm. Simulations show that noise also speeds up convergence in stochastic unsupervised competitive learning, supervised competitive learning, and differential competitive learning. Copyright © 2012 Elsevier Ltd. All rights reserved.
Cluster analysis of multiple planetary flow regimes

NASA Technical Reports Server (NTRS)

Mo, Kingtse; Ghil, Michael

1987-01-01

A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Indoor Subspacing to Implement Indoorgml for Indoor Navigation

NASA Astrophysics Data System (ADS)

Jung, H.; Lee, J.

2015-10-01

According to an increasing demand for indoor navigation, there are great attempts to develop applicable indoor network. Representation for a room as a node is not sufficient to apply complex and large buildings. As OGC established IndoorGML, subspacing to partition the space for constructing logical network is introduced. Concerning subspacing for indoor network, transition space like halls or corridors also have to be considered. This study presents the subspacing process for creating an indoor network in shopping mall. Furthermore, categorization of transition space is performed and subspacing of this space is considered. Hall and squares in mall is especially defined for subspacing. Finally, implementation of subspacing process for indoor network is presented.
Optimizing Cubature for Efficient Integration of Subspace Deformations

PubMed Central

An, Steven S.; Kim, Theodore; James, Doug L.

2009-01-01

We propose an efficient scheme for evaluating nonlinear subspace forces (and Jacobians) associated with subspace deformations. The core problem we address is efficient integration of the subspace force density over the 3D spatial domain. Similar to Gaussian quadrature schemes that efficiently integrate functions that lie in particular polynomial subspaces, we propose cubature schemes (multi-dimensional quadrature) optimized for efficient integration of force densities associated with particular subspace deformations, particular materials, and particular geometric domains. We support generic subspace deformation kinematics, and nonlinear hyperelastic materials. For an r-dimensional deformation subspace with O(r) cubature points, our method is able to evaluate subspace forces at O(r2) cost. We also describe composite cubature rules for runtime error estimation. Results are provided for various subspace deformation models, several hyperelastic materials (St.Venant-Kirchhoff, Mooney-Rivlin, Arruda-Boyce), and multimodal (graphics, haptics, sound) applications. We show dramatically better efficiency than traditional Monte Carlo integration. CR Categories: I.6.8 [Simulation and Modeling]: Types of Simulation—Animation, I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Physically based modeling G.1.4 [Mathematics of Computing]: Numerical Analysis—Quadrature and Numerical Differentiation PMID:19956777
Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis

PubMed Central

LI, HUAMIN; LINDERMAN, GEORGE C.; SZLAM, ARTHUR; STANTON, KELLY P.; KLUGER, YUVAL; TYGERT, MARK

2017-01-01

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis and the calculation of truncated singular value decompositions. The present article presents an essentially black-box, foolproof implementation for Mathworks’ MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical deterministic techniques (such as Lanczos iterations run to convergence) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces). PMID:28983138
begin{center} MUSIC Algorithms for Rebar Detection

NASA Astrophysics Data System (ADS)

Leone, G.; Solimene, R.

2012-04-01

In this contribution we consider the problem of detecting and localizing small cross section, with respect to the wavelength, scatterers from their scattered field once a known incident field interrogated the scene where they reside. A pertinent applicative context is rebar detection within concrete pillar. For such a case, scatterers to be detected are represented by rebars themselves or by voids due to their lacking. In both cases, as scatterers have point-like support, a subspace projection method can be conveniently exploited [1]. However, as the field scattered by rebars is stronger than the one due to voids, it is expected that the latter can be difficult to be detected. In order to circumvent this problem, in this contribution we adopt a two-step MUltiple SIgnal Classification (MUSIC) detection algorithm. In particular, the first stage aims at detecting rebars. Once rebar are detected, their positions are exploited to update the Green's function and then a further detection scheme is run to locate voids. However, in this second case, background medium encompasses also the rabars. The analysis is conducted numerically for a simplified two-dimensional scalar scattering geometry. More in detail, as is usual in MUSIC algorithm, a multi-view/multi-static single-frequency configuration is considered [2]. Baratonia, G. Leone, R. Pierri, R. Solimene, "Fault Detection in Grid Scattering by a Time-Reversal MUSIC Approach," Porc. Of ICEAA 2011, Turin, 2011. E. A. Marengo, F. K. Gruber, "Subspace-Based Localization and Inverse Scattering of Multiply Scattering Point Targets," EURASIP Journal on Advances in Signal Processing, 2007, Article ID 17342, 16 pages (2007).
Clustering PPI data by combining FA and SHC method.

PubMed

Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin

2015-01-01

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method

PubMed Central

2015-01-01

Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Cascadia Seismicity Related to Seamount Subduction as detected by the Cascadia Initiative Amphibious Data

NASA Astrophysics Data System (ADS)

Morton, E.; Bilek, S. L.; Rowe, C. A.

2016-12-01

Unlike other subduction zones, the Cascadia subduction zone (CSZ) is notable for the absence of detected and located small and moderate magnitude interplate earthquakes, despite the presence of recurring episodic tremor and slip (ETS) downdip and evidence of pre-historic great earthquakes. Thermal and geodetic models indicate that the seismogenic zone exists primarily, if not entirely, offshore; therefore the perceived unusual seismic quiescence may be a consequence of seismic source location in relation to land based seismometers. The Cascadia Initiative (CI) amphibious community seismic experiment includes ocean bottom seismometers (OBS) deployed directly above the presumed locked seismogenic zone. We use the CI dataset to search for small magnitude interplate earthquakes previously undetected using the on-land sensors alone. We implement subspace detection to search for small earthquakes. We build our subspace with template events from existing earthquake catalogs that appear to have occurred on the plate interface, windowing waveforms on CI OBS and land seismometers. Although our efforts will target the entire CSZ margin and full 4-year CI deployment, here we focus on a previously identified cluster off the coast of Oregon, related to a subducting seamount. During the first year of CI deployment, this target area yields 293 unique detections with 86 well-located events. Thirty-two of these events occurred within the seamount cluster, and 13 events were located in another cluster to the northwest of the seamount. Events within the seamount cluster are separated into those whose depths place them on the plate interface, and a shallower set ( 5 km depth). These separate event groups track together temporally, and seem to agree with a model of seamount subduction that creates extensive fracturing around the seamount, rather than stress concentrated at the seamount-plate boundary. During CI year 2, this target area yields >1000 additional event detections.
Cluster synchronization in networks of identical oscillators with α-function pulse coupling.

PubMed

Chen, Bolun; Engelbrecht, Jan R; Mirollo, Renato

2017-02-01

We study a network of N identical leaky integrate-and-fire model neurons coupled by α-function pulses, weighted by a coupling parameter K. Studies of the dynamics of this system have mostly focused on the stability of the fully synchronized and the fully asynchronous splay states, which naturally depends on the sign of K, i.e., excitation vs inhibition. We find that there is also a rich set of attractors consisting of clusters of fully synchronized oscillators, such as fixed (N-1,1) states, which have synchronized clusters of sizes N-1 and 1, as well as splay states of clusters with equal sizes greater than 1. Additionally, we find limit cycles that clarify the stability of previously observed quasiperiodic behavior. Our framework exploits the neutrality of the dynamics for K=0 which allows us to implement a dimensional reduction strategy that simplifies the dynamics to a continuous flow on a codimension 3 subspace with the sign of K determining the flow direction. This reduction framework naturally incorporates a hierarchy of partially synchronized subspaces in which the new attracting states lie. Using high-precision numerical simulations, we describe completely the sequence of bifurcations and the stability of all fixed points and limit cycles for N=2-4. The set of possible attracting states can be used to distinguish different classes of neuron models. For instance from our previous work [Chaos 24, 013114 (2014)CHAOEH1054-150010.1063/1.4858458] we know that of the types of partially synchronized states discussed here, only the (N-1,1) states can be stable in systems of identical coupled sinusoidal (i.e., Kuramoto type) oscillators, such as θ-neuron models. Upon introducing a small variation in individual neuron parameters, the attracting fixed points we discuss here generalize to equivalent fixed points in which neurons need not fire coincidently.
Cluster synchronization in networks of identical oscillators with α -function pulse coupling

NASA Astrophysics Data System (ADS)

Chen, Bolun; Engelbrecht, Jan R.; Mirollo, Renato

2017-02-01

We study a network of N identical leaky integrate-and-fire model neurons coupled by α -function pulses, weighted by a coupling parameter K . Studies of the dynamics of this system have mostly focused on the stability of the fully synchronized and the fully asynchronous splay states, which naturally depends on the sign of K , i.e., excitation vs inhibition. We find that there is also a rich set of attractors consisting of clusters of fully synchronized oscillators, such as fixed (N -1 ,1 ) states, which have synchronized clusters of sizes N -1 and 1, as well as splay states of clusters with equal sizes greater than 1. Additionally, we find limit cycles that clarify the stability of previously observed quasiperiodic behavior. Our framework exploits the neutrality of the dynamics for K =0 which allows us to implement a dimensional reduction strategy that simplifies the dynamics to a continuous flow on a codimension 3 subspace with the sign of K determining the flow direction. This reduction framework naturally incorporates a hierarchy of partially synchronized subspaces in which the new attracting states lie. Using high-precision numerical simulations, we describe completely the sequence of bifurcations and the stability of all fixed points and limit cycles for N =2 -4 . The set of possible attracting states can be used to distinguish different classes of neuron models. For instance from our previous work [Chaos 24, 013114 (2014), 10.1063/1.4858458] we know that of the types of partially synchronized states discussed here, only the (N -1 ,1 ) states can be stable in systems of identical coupled sinusoidal (i.e., Kuramoto type) oscillators, such as θ -neuron models. Upon introducing a small variation in individual neuron parameters, the attracting fixed points we discuss here generalize to equivalent fixed points in which neurons need not fire coincidently.
Eigensolution of finite element problems in a completely connected parallel architecture

NASA Technical Reports Server (NTRS)

Akl, F.; Morel, M.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Pure endmember extraction using robust kernel archetypoid analysis for hyperspectral imagery

NASA Astrophysics Data System (ADS)

Sun, Weiwei; Yang, Gang; Wu, Ke; Li, Weiyue; Zhang, Dianfa

2017-09-01

A robust kernel archetypoid analysis (RKADA) method is proposed to extract pure endmembers from hyperspectral imagery (HSI). The RKADA assumes that each pixel is a sparse linear mixture of all endmembers and each endmember corresponds to a real pixel in the image scene. First, it improves the re8gular archetypal analysis with a new binary sparse constraint, and the adoption of the kernel function constructs the principal convex hull in an infinite Hilbert space and enlarges the divergences between pairwise pixels. Second, the RKADA transfers the pure endmember extraction problem into an optimization problem by minimizing residual errors with the Huber loss function. The Huber loss function reduces the effects from big noises and outliers in the convergence procedure of RKADA and enhances the robustness of the optimization function. Third, the random kernel sinks for fast kernel matrix approximation and the two-stage algorithm for optimizing initial pure endmembers are utilized to improve its computational efficiency in realistic implementations of RKADA, respectively. The optimization equation of RKADA is solved by using the block coordinate descend scheme and the desired pure endmembers are finally obtained. Six state-of-the-art pure endmember extraction methods are employed to make comparisons with the RKADA on both synthetic and real Cuprite HSI datasets, including three geometrical algorithms vertex component analysis (VCA), alternative volume maximization (AVMAX) and orthogonal subspace projection (OSP), and three matrix factorization algorithms the preconditioning for successive projection algorithm (PreSPA), hierarchical clustering based on rank-two nonnegative matrix factorization (H2NMF) and self-dictionary multiple measurement vector (SDMMV). Experimental results show that the RKADA outperforms all the six methods in terms of spectral angle distance (SAD) and root-mean-square-error (RMSE). Moreover, the RKADA has short computational times in offline operations and shows significant improvement in identifying pure endmembers for ground objects with smaller spectrum differences. Therefore, the RKADA could be an alternative for pure endmember extraction from hyperspectral images.
Information Clustering Based on Fuzzy Multisets.

ERIC Educational Resources Information Center

Miyamoto, Sadaaki

2003-01-01

Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
An improved clustering algorithm based on reverse learning in intelligent transportation

NASA Astrophysics Data System (ADS)

Qiu, Guoqing; Kou, Qianqian; Niu, Ting

2017-05-01

With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
A roadmap of clustering algorithms: finding a match for a biomedical application.

PubMed

Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

2009-05-01

Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

Efficient clustering aggregation based on data fragments.

PubMed

Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

2012-06-01

Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

PubMed

Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

2016-12-01

This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oesterling, Patrick; Heine, Christian; Weber, Gunther H.

2012-05-04

Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Decentralized System Identification Using Stochastic Subspace Identification for Wireless Sensor Networks

PubMed Central

Cho, Soojin; Park, Jong-Woong; Sim, Sung-Han

2015-01-01

Wireless sensor networks (WSNs) facilitate a new paradigm to structural identification and monitoring for civil infrastructure. Conventional structural monitoring systems based on wired sensors and centralized data acquisition systems are costly for installation as well as maintenance. WSNs have emerged as a technology that can overcome such difficulties, making deployment of a dense array of sensors on large civil structures both feasible and economical. However, as opposed to wired sensor networks in which centralized data acquisition and processing is common practice, WSNs require decentralized computing algorithms to reduce data transmission due to the limitation associated with wireless communication. In this paper, the stochastic subspace identification (SSI) technique is selected for system identification, and SSI-based decentralized system identification (SDSI) is proposed to be implemented in a WSN composed of Imote2 wireless sensors that measure acceleration. The SDSI is tightly scheduled in the hierarchical WSN, and its performance is experimentally verified in a laboratory test using a 5-story shear building model. PMID:25856325
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.

PubMed

Hu, Qin; Victor, Jonathan D

2016-09-01

Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

PubMed

Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

2015-01-01

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
A weighted information criterion for multiple minor components and its adaptive extraction algorithms.

PubMed

Gao, Yingbin; Kong, Xiangyu; Zhang, Huihui; Hou, Li'an

2017-05-01

Minor component (MC) plays an important role in signal processing and data analysis, so it is a valuable work to develop MC extraction algorithms. Based on the concepts of weighted subspace and optimum theory, a weighted information criterion is proposed for searching the optimum solution of a linear neural network. This information criterion exhibits a unique global minimum attained if and only if the state matrix is composed of the desired MCs of an autocorrelation matrix of an input signal. By using gradient ascent method and recursive least square (RLS) method, two algorithms are developed for multiple MCs extraction. The global convergences of the proposed algorithms are also analyzed by the Lyapunov method. The proposed algorithms can extract the multiple MCs in parallel and has advantage in dealing with high dimension matrices. Since the weighted matrix does not require an accurate value, it facilitates the system design of the proposed algorithms for practical applications. The speed and computation advantages of the proposed algorithms are verified through simulations. Copyright © 2017 Elsevier Ltd. All rights reserved.
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.

PubMed

Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita

2016-04-01

Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
A novel artificial immune algorithm for spatial clustering with obstacle constraint and its applications.

PubMed

Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

2014-01-01

An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

PubMed Central

Liu, Wenfen

2017-01-01

Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447
Multilevel algorithms for nonlinear optimization

NASA Technical Reports Server (NTRS)

Alexandrov, Natalia; Dennis, J. E., Jr.

1994-01-01

Multidisciplinary design optimization (MDO) gives rise to nonlinear optimization problems characterized by a large number of constraints that naturally occur in blocks. We propose a class of multilevel optimization methods motivated by the structure and number of constraints and by the expense of the derivative computations for MDO. The algorithms are an extension to the nonlinear programming problem of the successful class of local Brown-Brent algorithms for nonlinear equations. Our extensions allow the user to partition constraints into arbitrary blocks to fit the application, and they separately process each block and the objective function, restricted to certain subspaces. The methods use trust regions as a globalization strategy, and they have been shown to be globally convergent under reasonable assumptions. The multilevel algorithms can be applied to all classes of MDO formulations. Multilevel algorithms for solving nonlinear systems of equations are a special case of the multilevel optimization methods. In this case, they can be viewed as a trust-region globalization of the Brown-Brent class.
Soft learning vector quantization and clustering algorithms based on ordered weighted aggregation operators.

PubMed

Karayiannis, N B

2000-01-01

This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.
Subspace Iteration and Immersed Interface Methods: Theory, Algorithm, and Applications

DTIC Science & Technology

2010-08-20

boundary, Navier-Stokes equations Zhilin Li, Kazufumi Ito North Carolina State University Office of Contract and Grants Leazar Hall Lower Level- MC...ORGANIZATION REPORT NUMBER 19a. NAME OF RESPONSIBLE PERSON 19b. TELEPHONE NUMBER Zhilin Li 919-515-3210 3. DATES COVERED (From - To) 15-Aug-2006...Names of Faculty Supported National Academy MemberPERCENT_SUPPORTEDNAME Zhilin Li 0.40 No Kazufumi Ito 0.40 No 0.80FTE Equivalent: 2Total Number
Supervised orthogonal discriminant subspace projects learning for face recognition.

PubMed

Chen, Yu; Xu, Xiao-Hong

2014-02-01

In this paper, a new linear dimension reduction method called supervised orthogonal discriminant subspace projection (SODSP) is proposed, which addresses high-dimensionality of data and the small sample size problem. More specifically, given a set of data points in the ambient space, a novel weight matrix that describes the relationship between the data points is first built. And in order to model the manifold structure, the class information is incorporated into the weight matrix. Based on the novel weight matrix, the local scatter matrix as well as non-local scatter matrix is defined such that the neighborhood structure can be preserved. In order to enhance the recognition ability, we impose an orthogonal constraint into a graph-based maximum margin analysis, seeking to find a projection that maximizes the difference, rather than the ratio between the non-local scatter and the local scatter. In this way, SODSP naturally avoids the singularity problem. Further, we develop an efficient and stable algorithm for implementing SODSP, especially, on high-dimensional data set. Moreover, the theoretical analysis shows that LPP is a special instance of SODSP by imposing some constraints. Experiments on the ORL, Yale, Extended Yale face database B and FERET face database are performed to test and evaluate the proposed algorithm. The results demonstrate the effectiveness of SODSP. Copyright © 2013 Elsevier Ltd. All rights reserved.
Dimension Reduction With Extreme Learning Machine.

PubMed

Kasun, Liyanaarachchi Lekamalage Chamara; Yang, Yan; Huang, Guang-Bin; Zhang, Zhengyou

2016-08-01

Data may often contain noise or irrelevant information, which negatively affect the generalization capability of machine learning algorithms. The objective of dimension reduction algorithms, such as principal component analysis (PCA), non-negative matrix factorization (NMF), random projection (RP), and auto-encoder (AE), is to reduce the noise or irrelevant information of the data. The features of PCA (eigenvectors) and linear AE are not able to represent data as parts (e.g. nose in a face image). On the other hand, NMF and non-linear AE are maimed by slow learning speed and RP only represents a subspace of original data. This paper introduces a dimension reduction framework which to some extend represents data as parts, has fast learning speed, and learns the between-class scatter subspace. To this end, this paper investigates a linear and non-linear dimension reduction framework referred to as extreme learning machine AE (ELM-AE) and sparse ELM-AE (SELM-AE). In contrast to tied weight AE, the hidden neurons in ELM-AE and SELM-AE need not be tuned, and their parameters (e.g, input weights in additive neurons) are initialized using orthogonal and sparse random weights, respectively. Experimental results on USPS handwritten digit recognition data set, CIFAR-10 object recognition, and NORB object recognition data set show the efficacy of linear and non-linear ELM-AE and SELM-AE in terms of discriminative capability, sparsity, training time, and normalized mean square error.
Preserving Symmetry in Preconditioned Krylov Subspace Methods

NASA Technical Reports Server (NTRS)

Chan, Tony F.; Chow, E.; Saad, Y.; Yeung, M. C.

1996-01-01

We consider the problem of solving a linear system Ax = b when A is nearly symmetric and when the system is preconditioned by a symmetric positive definite matrix M. In the symmetric case, one can recover symmetry by using M-inner products in the conjugate gradient (CG) algorithm. This idea can also be used in the nonsymmetric case, and near symmetry can be preserved similarly. Like CG, the new algorithms are mathematically equivalent to split preconditioning, but do not require M to be factored. Better robustness in a specific sense can also be observed. When combined with truncated versions of iterative methods, tests show that this is more effective than the common practice of forfeiting near-symmetry altogether.
Hierarchical Dirichlet process model for gene expression clustering

PubMed Central

2013-01-01

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447
Canonical PSO Based K-Means Clustering Approach for Real Datasets.

PubMed

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
Cyclo-stationary linear parameter time-varying subspace realization method applied for identification of horizontal-axis wind turbines

NASA Astrophysics Data System (ADS)

Velazquez, Antonio; Swartz, R. Andrew

2013-04-01

Wind energy is becoming increasingly important worldwide as an alternative renewable energy source. Economical, maintenance and operation are critical issues for large slender dynamic structures, especially for remote offshore wind farms. Health monitoring systems are very promising instruments to assure reliability and good performance of the structure. These sensing and control technologies are typically informed by models based on mechanics or data-driven identification techniques in the time and/or frequency domain. Frequency response functions are popular but are difficult to realize autonomously for structures of higher order and having overlapping frequency content. Instead, time-domain techniques have shown powerful advantages from a practical point of view (e.g. embedded algorithms in wireless-sensor networks), being more suitable to differentiate closely-related modes. Customarily, time-varying effects are often neglected or dismissed to simplify the analysis, but such is not the case for wind loaded structures with spinning multibodies. A more complex scenario is constituted when dealing with both periodic mechanisms responsible for the vibration shaft of the rotor-blade system, and the wind tower substructure interaction. Transformations of the cyclic effects on the vibration data can be applied to isolate inertia quantities different from rotating-generated forces that are typically non-stationary in nature. After applying these transformations, structural identification can be carried out by stationary techniques via data-correlated Eigensystem realizations. In this paper an exploration of a periodic stationary or cyclo-stationary subspace identification technique is presented here by means of a modified Eigensystem Realization Algorithm (ERA) via Stochastic Subspace Identification (SSI) and Linear Parameter Time-Varying (LPTV) techniques. Structural response is assumed under stationary ambient excitation produced by a Gaussian (white) noise assembled in the operative range bandwidth of horizontal-axis wind turbines. ERA-OKID analysis is driven by correlation-function matrices from the stationary ambient response aiming to reduce noise effects. Singular value decomposition (SVD) and eigenvalue analysis are computed in a last stage to get frequencies and mode shapes. Proposed assumptions are carefully weighted to account for the uncertainty of the environment the wind turbines are subjected to. A numerical example is presented based on data acquisition carried out in a BWC XL.1 low power wind turbine device installed in University of California at Davis. Finally, comments and observations are provided on how this subspace realization technique can be extended for modal-parameter identification using exclusively ambient vibration data.
Output-only cyclo-stationary linear-parameter time-varying stochastic subspace identification method for rotating machinery and spinning structures

NASA Astrophysics Data System (ADS)

Velazquez, Antonio; Swartz, R. Andrew

2015-02-01

Economical maintenance and operation are critical issues for rotating machinery and spinning structures containing blade elements, especially large slender dynamic beams (e.g., wind turbines). Structural health monitoring systems represent promising instruments to assure reliability and good performance from the dynamics of the mechanical systems. However, such devices have not been completely perfected for spinning structures. These sensing technologies are typically informed by both mechanistic models coupled with data-driven identification techniques in the time and/or frequency domain. Frequency response functions are popular but are difficult to realize autonomously for structures of higher order, especially when overlapping frequency content is present. Instead, time-domain techniques have shown to possess powerful advantages from a practical point of view (i.e. low-order computational effort suitable for real-time or embedded algorithms) and also are more suitable to differentiate closely-related modes. Customarily, time-varying effects are often neglected or dismissed to simplify this analysis, but such cannot be the case for sinusoidally loaded structures containing spinning multi-bodies. A more complex scenario is constituted when dealing with both periodic mechanisms responsible for the vibration shaft of the rotor-blade system and the interaction of the supporting substructure. Transformations of the cyclic effects on the vibrational data can be applied to isolate inertial quantities that are different from rotation-generated forces that are typically non-stationary in nature. After applying these transformations, structural identification can be carried out by stationary techniques via data-correlated eigensystem realizations. In this paper, an exploration of a periodic stationary or cyclo-stationary subspace identification technique is presented here for spinning multi-blade systems by means of a modified Eigensystem Realization Algorithm (ERA) via stochastic subspace identification (SSI) and linear parameter time-varying (LPTV) techniques. Structural response is assumed to be stationary ambient excitation produced by a Gaussian (white) noise within the operative range bandwidth of the machinery or structure in study. ERA-OKID analysis is driven by correlation-function matrices from the stationary ambient response aiming to reduce noise effects. Singular value decomposition (SVD) and eigenvalue analysis are computed in a last stage to identify frequencies and complex-valued mode shapes. Proposed assumptions are carefully weighted to account for the uncertainty of the environment. A numerical example is carried out based a spinning finite element (SFE) model, and verified using ANSYS® Ver. 12. Finally, comments and observations are provided on how this subspace realization technique can be extended to the problem of modal-parameter identification using only ambient vibration data.

A hybrid monkey search algorithm for clustering analysis.

PubMed

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means

NASA Astrophysics Data System (ADS)

Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.

2018-04-01

This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.
Overview of Krylov subspace methods with applications to control problems

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

An overview of projection methods based on Krylov subspaces are given with emphasis on their application to solving matrix equations that arise in control problems. The main idea of Krylov subspace methods is to generate a basis of the Krylov subspace Span and seek an approximate solution the the original problem from this subspace. Thus, the original matrix problem of size N is approximated by one of dimension m typically much smaller than N. Krylov subspace methods have been very successful in solving linear systems and eigenvalue problems and are now just becoming popular for solving nonlinear equations. It is shown how they can be used to solve partial pole placement problems, Sylvester's equation, and Lyapunov's equation.
Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping.

PubMed

Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin

2009-02-01

A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.
Canonical PSO Based K-Means Clustering Approach for Real Datasets

PubMed Central

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
A method of operation scheduling based on video transcoding for cluster equipment

NASA Astrophysics Data System (ADS)

Zhou, Haojie; Yan, Chun

2018-04-01

Because of the cluster technology in real-time video transcoding device, the application of facing the massive growth in the number of video assignments and resolution and bit rate of diversity, task scheduling algorithm, and analyze the current mainstream of cluster for real-time video transcoding equipment characteristics of the cluster, combination with the characteristics of the cluster equipment task delay scheduling algorithm is proposed. This algorithm enables the cluster to get better performance in the generation of the job queue and the lower part of the job queue when receiving the operation instruction. In the end, a small real-time video transcode cluster is constructed to analyze the calculation ability, running time, resource occupation and other aspects of various algorithms in operation scheduling. The experimental results show that compared with traditional clustering task scheduling algorithm, task delay scheduling algorithm has more flexible and efficient characteristics.
[Cluster analysis in biomedical researches].

PubMed

Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

2013-01-01

Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Data depth based clustering analysis

DOE PAGES

Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

2016-01-01

Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Clustering analysis of moving target signatures

NASA Astrophysics Data System (ADS)

Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

2010-04-01

Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Detection of Repeating Earthquakes within the Cascadia Subduction Zone Using 2013-2014 Cascadia Initiative Amphibious Network Data

NASA Astrophysics Data System (ADS)

Kenefic, L.; Morton, E.; Bilek, S.

2017-12-01

It is well known that subduction zones create the largest earthquakes in the world, like the magnitude 9.5 Chile earthquake in 1960, or the more recent 9.1 magnitude Japan earthquake in 2011, both of which are in the top five largest earthquakes ever recorded. However, off the coast of the Pacific Northwest region of the U.S., the Cascadia subduction zone (CSZ) remains relatively quiet and modern seismic instruments have not recorded earthquakes of this size in the CSZ. The last great earthquake, a magnitude 8.7-9.2, occurred in 1700 and is constrained by written reports of the resultant tsunami in Japan and dating a drowned forest in the U.S. Previous studies have suggested the margin is most likely segmented along-strike. However, variations in frictional conditions in the CSZ fault zone are not well known. Geodetic modeling indicates that the locked seismogenic zone is likely completely offshore, which may be too far from land seismometers to adequately detect related seismicity. Ocean bottom seismometers, as part of the Cascadia Initiative Amphibious Network, were installed directly above the inferred seismogenic zone, which we use to better detect small interplate seismicity. Using the subspace detection method, this study looks to find new seismogenic zone earthquakes. This subspace detection method uses multiple previously known event templates concurrently to scan through continuous seismic data. Template events that make up the subspace are chosen from events in existing catalogs that likely occurred along the plate interface. Corresponding waveforms are windowed on the nearby Cascadia Initiative ocean bottom seismometers and coastal land seismometers for scanning. Detections that are found by the scan are similar to the template waveforms based upon a predefined threshold. Detections are then visually examined to determine if an event is present. The presence of repeating event clusters can indicate persistent seismic patches, likely corresponding to areas of stronger coupling. This work will ultimately improve the understanding of CSZ fault zone heterogeneity. Preliminary results gathered indicate 96 possible new events between August 2, 2013 and July 1, 2014 for four target clusters off the coast of northern Oregon.
Robust and Efficient Spin Purification for Determinantal Configuration Interaction.

PubMed

Fales, B Scott; Hohenstein, Edward G; Levine, Benjamin G

2017-09-12

The limited precision of floating point arithmetic can lead to the qualitative and even catastrophic failure of quantum chemical algorithms, especially when high accuracy solutions are sought. For example, numerical errors accumulated while solving for determinantal configuration interaction wave functions via Davidson diagonalization may lead to spin contamination in the trial subspace. This spin contamination may cause the procedure to converge to roots with undesired ⟨Ŝ 2 ⟩, wasting computer time in the best case and leading to incorrect conclusions in the worst. In hopes of finding a suitable remedy, we investigate five purification schemes for ensuring that the eigenvectors have the desired ⟨Ŝ 2 ⟩. These schemes are based on projection, penalty, and iterative approaches. All of these schemes rely on a direct, graphics processing unit-accelerated algorithm for calculating the S 2 c matrix-vector product. We assess the computational cost and convergence behavior of these methods by application to several benchmark systems and find that the first-order spin penalty method is the optimal choice, though first-order and Löwdin projection approaches also provide fast convergence to the desired spin state. Finally, to demonstrate the utility of these approaches, we computed the lowest several excited states of an open-shell silver cluster (Ag 19 ) using the state-averaged complete active space self-consistent field method, where spin purification was required to ensure spin stability of the CI vector coefficients. Several low-lying states with significant multiply excited character are predicted, suggesting the value of a multireference approach for modeling plasmonic nanomaterials.
Hyperspectral data acquisition and analysis in imaging and real-time active MIR backscattering spectroscopy

NASA Astrophysics Data System (ADS)

Jarvis, Jan; Haertelt, Marko; Hugger, Stefan; Butschek, Lorenz; Fuchs, Frank; Ostendorf, Ralf; Wagner, Joachim; Beyerer, Juergen

2017-04-01

In this work we present data analysis algorithms for detection of hazardous substances in hyperspectral observations acquired using active mid-infrared (MIR) backscattering spectroscopy. We present a novel background extraction algorithm based on the adaptive target generation process proposed by Ren and Chang called the adaptive background generation process (ABGP) that generates a robust and physically meaningful set of background spectra for operation of the well-known adaptive matched subspace detection (AMSD) algorithm. It is shown that the resulting AMSD-ABGP detection algorithm competes well with other widely used detection algorithms. The method is demonstrated in measurement data obtained by two fundamentally different active MIR hyperspectral data acquisition devices. A hyperspectral image sensor applicable in static scenes takes a wavelength sequential approach to hyperspectral data acquisition, whereas a rapid wavelength-scanning single-element detector variant of the same principle uses spatial scanning to generate the hyperspectral observation. It is shown that the measurement timescale of the latter is sufficient for the application of the data analysis algorithms even in dynamic scenarios.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

NASA Astrophysics Data System (ADS)

Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

2017-03-01

DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Multiple Kernel Sparse Representation based Orthogonal Discriminative Projection and Its Cost-Sensitive Extension.

PubMed

Zhang, Guoqing; Sun, Huaijiang; Xia, Guiyu; Sun, Quansen

2016-07-07

Sparse representation based classification (SRC) has been developed and shown great potential for real-world application. Based on SRC, Yang et al. [10] devised a SRC steered discriminative projection (SRC-DP) method. However, as a linear algorithm, SRC-DP cannot handle the data with highly nonlinear distribution. Kernel sparse representation-based classifier (KSRC) is a non-linear extension of SRC and can remedy the drawback of SRC. KSRC requires the use of a predetermined kernel function and selection of the kernel function and its parameters is difficult. Recently, multiple kernel learning for SRC (MKL-SRC) [22] has been proposed to learn a kernel from a set of base kernels. However, MKL-SRC only considers the within-class reconstruction residual while ignoring the between-class relationship, when learning the kernel weights. In this paper, we propose a novel multiple kernel sparse representation-based classifier (MKSRC), and then we use it as a criterion to design a multiple kernel sparse representation based orthogonal discriminative projection method (MK-SR-ODP). The proposed algorithm aims at learning a projection matrix and a corresponding kernel from the given base kernels such that in the low dimension subspace the between-class reconstruction residual is maximized and the within-class reconstruction residual is minimized. Furthermore, to achieve a minimum overall loss by performing recognition in the learned low-dimensional subspace, we introduce cost information into the dimensionality reduction method. The solutions for the proposed method can be efficiently found based on trace ratio optimization method [33]. Extensive experimental results demonstrate the superiority of the proposed algorithm when compared with the state-of-the-art methods.
A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
Acceleration of GPU-based Krylov solvers via data transfer reduction

DOE PAGES

Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...

2015-04-08

Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
Clustering algorithm for determining community structure in large networks

NASA Astrophysics Data System (ADS)

Pujol, Josep M.; Béjar, Javier; Delgado, Jordi

2006-07-01

We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
Clustering algorithm evaluation and the development of a replacement for procedure 1. [for crop inventories

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Johnson, J. K.

1979-01-01

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

PubMed Central

Wernisch, Lorenz

2017-01-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

PubMed

Gabasova, Evelina; Reid, John; Wernisch, Lorenz

2017-10-01

Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.

CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

PubMed

Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

2017-08-31

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

PubMed Central

Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

2017-01-01

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

NASA Technical Reports Server (NTRS)

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Fast detection of the fuzzy communities based on leader-driven algorithm

NASA Astrophysics Data System (ADS)

Fang, Changjian; Mu, Dejun; Deng, Zhenghong; Hu, Jun; Yi, Chen-He

2018-03-01

In this paper, we present the leader-driven algorithm (LDA) for learning community structure in networks. The algorithm allows one to find overlapping clusters in a network, an important aspect of real networks, especially social networks. The algorithm requires no input parameters and learns the number of clusters naturally from the network. It accomplishes this using leadership centrality in a clever manner. It identifies local minima of leadership centrality as followers which belong only to one cluster, and the remaining nodes are leaders which connect clusters. In this way, the number of clusters can be learned using only the network structure. The LDA is also an extremely fast algorithm, having runtime linear in the network size. Thus, this algorithm can be used to efficiently cluster extremely large networks.
Research on retailer data clustering algorithm based on Spark

NASA Astrophysics Data System (ADS)

Huang, Qiuman; Zhou, Feng

2017-03-01

Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Variational second order density matrix study of F3-: importance of subspace constraints for size-consistency.

PubMed

van Aggelen, Helen; Verstichel, Brecht; Bultinck, Patrick; Van Neck, Dimitri; Ayers, Paul W; Cooper, David L

2011-02-07

Variational second order density matrix theory under "two-positivity" constraints tends to dissociate molecules into unphysical fractionally charged products with too low energies. We aim to construct a qualitatively correct potential energy surface for F(3)(-) by applying subspace energy constraints on mono- and diatomic subspaces of the molecular basis space. Monoatomic subspace constraints do not guarantee correct dissociation: the constraints are thus geometry dependent. Furthermore, the number of subspace constraints needed for correct dissociation does not grow linearly with the number of atoms. The subspace constraints do impose correct chemical properties in the dissociation limit and size-consistency, but the structure of the resulting second order density matrix method does not exactly correspond to a system of noninteracting units.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm

PubMed Central

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

PubMed

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
GDPC: Gravitation-based Density Peaks Clustering algorithm

NASA Astrophysics Data System (ADS)

Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

2018-07-01

The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.
Mining the National Career Assessment Examination Result Using Clustering Algorithm

NASA Astrophysics Data System (ADS)

Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

2018-03-01

Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

PubMed Central

Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

2015-01-01

This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
A non-contact method based on multiple signal classification algorithm to reduce the measurement time for accurately heart rate detection

NASA Astrophysics Data System (ADS)

Bechet, P.; Mitran, R.; Munteanu, M.

2013-08-01

Non-contact methods for the assessment of vital signs are of great interest for specialists due to the benefits obtained in both medical and special applications, such as those for surveillance, monitoring, and search and rescue. This paper investigates the possibility of implementing a digital processing algorithm based on the MUSIC (Multiple Signal Classification) parametric spectral estimation in order to reduce the observation time needed to accurately measure the heart rate. It demonstrates that, by proper dimensioning the signal subspace, the MUSIC algorithm can be optimized in order to accurately assess the heart rate during an 8-28 s time interval. The validation of the processing algorithm performance was achieved by minimizing the mean error of the heart rate after performing simultaneous comparative measurements on several subjects. In order to calculate the error the reference value of heart rate was measured using a classic measurement system through direct contact.
Perfect blind restoration of images blurred by multiple filters: theory and efficient algorithms.

PubMed

Harikumar, G; Bresler, Y

1999-01-01

We address the problem of restoring an image from its noisy convolutions with two or more unknown finite impulse response (FIR) filters. We develop theoretical results about the existence and uniqueness of solutions, and show that under some generically true assumptions, both the filters and the image can be determined exactly in the absence of noise, and stably estimated in its presence. We present efficient algorithms to estimate the blur functions and their sizes. These algorithms are of two types, subspace-based and likelihood-based, and are extensions of techniques proposed for the solution of the multichannel blind deconvolution problem in one dimension. We present memory and computation-efficient techniques to handle the very large matrices arising in the two-dimensional (2-D) case. Once the blur functions are determined, they are used in a multichannel deconvolution step to reconstruct the unknown image. The theoretical and practical implications of edge effects, and "weakly exciting" images are examined. Finally, the algorithms are demonstrated on synthetic and real data.
The application of mixed recommendation algorithm with user clustering in the microblog advertisements promotion

NASA Astrophysics Data System (ADS)

Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

2017-03-01

The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Implementation of the block-Krylov boundary flexibility method of component synthesis

NASA Technical Reports Server (NTRS)

Carney, Kelly S.; Abdallah, Ayman A.; Hucklebridge, Arthur A.

1993-01-01

A method of dynamic substructuring is presented which utilizes a set of static Ritz vectors as a replacement for normal eigenvectors in component mode synthesis. This set of Ritz vectors is generated in a recurrence relationship, which has the form of a block-Krylov subspace. The initial seed to the recurrence algorithm is based on the boundary flexibility vectors of the component. This algorithm is not load-dependent, is applicable to both fixed and free-interface boundary components, and results in a general component model appropriate for any type of dynamic analysis. This methodology was implemented in the MSC/NASTRAN normal modes solution sequence using DMAP. The accuracy is found to be comparable to that of component synthesis based upon normal modes. The block-Krylov recurrence algorithm is a series of static solutions and so requires significantly less computation than solving the normal eigenspace problem.
Flag-based detection of weak gas signatures in long-wave infrared hyperspectral image sequences

NASA Astrophysics Data System (ADS)

Marrinan, Timothy; Beveridge, J. Ross; Draper, Bruce; Kirby, Michael; Peterson, Chris

2016-05-01

We present a flag manifold based method for detecting chemical plumes in long-wave infrared hyperspectral movies. The method encodes temporal and spatial information related to a hyperspectral pixel into a flag, or nested sequence of linear subspaces. The technique used to create the flags pushes information about the background clutter, ambient conditions, and potential chemical agents into the leading elements of the flags. Exploiting this temporal information allows for a detection algorithm that is sensitive to the presence of weak signals. This method is compared to existing techniques qualitatively on real data and quantitatively on synthetic data to show that the flag-based algorithm consistently performs better on data when the SINRdB is low, and beats the ACE and MF algorithms in probability of detection for low probabilities of false alarm even when the SINRdB is high.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review

NASA Astrophysics Data System (ADS)

Kim, Tai-Hoon

The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
Android Malware Classification Using K-Means Clustering Algorithm

NASA Astrophysics Data System (ADS)

Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

2017-08-01

Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
An extended affinity propagation clustering method based on different data density types.

PubMed

Zhao, XiuLi; Xu, WeiXiang

2015-01-01

Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.

Energy Aware Clustering Algorithms for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian

2011-09-01

The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Conformational states and folding pathways of peptides revealed by principal-independent component analyses.

PubMed

Nguyen, Phuong H

2007-05-15

Principal component analysis is a powerful method for projecting multidimensional conformational space of peptides or proteins onto lower dimensional subspaces in which the main conformations are present, making it easier to reveal the structures of molecules from e.g. molecular dynamics simulation trajectories. However, the identification of all conformational states is still difficult if the subspaces consist of more than two dimensions. This is mainly due to the fact that the principal components are not independent with each other, and states in the subspaces cannot be visualized. In this work, we propose a simple and fast scheme that allows one to obtain all conformational states in the subspaces. The basic idea is that instead of directly identifying the states in the subspace spanned by principal components, we first transform this subspace into another subspace formed by components that are independent of one other. These independent components are obtained from the principal components by employing the independent component analysis method. Because of independence between components, all states in this new subspace are defined as all possible combinations of the states obtained from each single independent component. This makes the conformational analysis much simpler. We test the performance of the method by analyzing the conformations of the glycine tripeptide and the alanine hexapeptide. The analyses show that our method is simple and quickly reveal all conformational states in the subspaces. The folding pathways between the identified states of the alanine hexapeptide are analyzed and discussed in some detail. 2007 Wiley-Liss, Inc.
Removal of impulse noise clusters from color images with local order statistics

NASA Astrophysics Data System (ADS)

Ruchay, Alexey; Kober, Vitaly

2017-09-01

This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.
Study of parameters of the nearest neighbour shared algorithm on clustering documents

NASA Astrophysics Data System (ADS)

Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

2018-03-01

Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Graph partitions and cluster synchronization in networks of oscillators

PubMed Central

Schaub, Michael T.; O’Clery, Neave; Billeh, Yazan N.; Delvenne, Jean-Charles; Lambiotte, Renaud; Barahona, Mauricio

2017-01-01

Synchronization over networks depends strongly on the structure of the coupling between the oscillators. When the coupling presents certain regularities, the dynamics can be coarse-grained into clusters by means of External Equitable Partitions of the network graph and their associated quotient graphs. We exploit this graph-theoretical concept to study the phenomenon of cluster synchronization, in which different groups of nodes converge to distinct behaviors. We derive conditions and properties of networks in which such clustered behavior emerges, and show that the ensuing dynamics is the result of the localization of the eigenvectors of the associated graph Laplacians linked to the existence of invariant subspaces. The framework is applied to both linear and non-linear models, first for the standard case of networks with positive edges, before being generalized to the case of signed networks with both positive and negative interactions. We illustrate our results with examples of both signed and unsigned graphs for consensus dynamics and for partial synchronization of oscillator networks under the master stability function as well as Kuramoto oscillators. PMID:27781454
A Variational Approach to Video Registration with Subspace Constraints.

PubMed

Garg, Ravi; Roussos, Anastasios; Agapito, Lourdes

2013-01-01

This paper addresses the problem of non-rigid video registration, or the computation of optical flow from a reference frame to each of the subsequent images in a sequence, when the camera views deformable objects. We exploit the high correlation between 2D trajectories of different points on the same non-rigid surface by assuming that the displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank motion basis. This subspace constraint effectively acts as a trajectory regularization term leading to temporally consistent optical flow. We formulate it as a robust soft constraint within a variational framework by penalizing flow fields that lie outside the low-rank manifold. The resulting energy functional can be decoupled into the optimization of the brightness constancy and spatial regularization terms, leading to an efficient optimization scheme. Additionally, we propose a novel optimization scheme for the case of vector valued images, based on the dualization of the data term. This allows us to extend our approach to deal with colour images which results in significant improvements on the registration results. Finally, we provide a new benchmark dataset, based on motion capture data of a flag waving in the wind, with dense ground truth optical flow for evaluation of multi-frame optical flow algorithms for non-rigid surfaces. Our experiments show that our proposed approach outperforms state of the art optical flow and dense non-rigid registration algorithms.
Accelerating molecular property calculations with nonorthonormal Krylov space methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Furche, Filipp; Krull, Brandon T.; Nguyen, Brian D.

Here, we formulate Krylov space methods for large eigenvalue problems and linear equation systems that take advantage of decreasing residual norms to reduce the cost of matrix-vector multiplication. The residuals are used as subspace basis without prior orthonormalization, which leads to generalized eigenvalue problems or linear equation systems on the Krylov space. These nonorthonormal Krylov space (nKs) algorithms are favorable for large matrices with irregular sparsity patterns whose elements are computed on the fly, because fewer operations are necessary as the residual norm decreases as compared to the conventional method, while errors in the desired eigenpairs and solution vectors remainmore » small. We consider real symmetric and symplectic eigenvalue problems as well as linear equation systems and Sylvester equations as they appear in configuration interaction and response theory. The nKs method can be implemented in existing electronic structure codes with minor modifications and yields speed-ups of 1.2-1.8 in typical time-dependent Hartree-Fock and density functional applications without accuracy loss. The algorithm can compute entire linear subspaces simultaneously which benefits electronic spectra and force constant calculations requiring many eigenpairs or solution vectors. The nKs approach is related to difference density methods in electronic ground state calculations, and particularly efficient for integral direct computations of exchange-type contractions. By combination with resolution-of-the-identity methods for Coulomb contractions, three- to fivefold speed-ups of hybrid time-dependent density functional excited state and response calculations are achieved.« less
Accelerating molecular property calculations with nonorthonormal Krylov space methods

DOE PAGES

Furche, Filipp; Krull, Brandon T.; Nguyen, Brian D.; ...

2016-05-03

Here, we formulate Krylov space methods for large eigenvalue problems and linear equation systems that take advantage of decreasing residual norms to reduce the cost of matrix-vector multiplication. The residuals are used as subspace basis without prior orthonormalization, which leads to generalized eigenvalue problems or linear equation systems on the Krylov space. These nonorthonormal Krylov space (nKs) algorithms are favorable for large matrices with irregular sparsity patterns whose elements are computed on the fly, because fewer operations are necessary as the residual norm decreases as compared to the conventional method, while errors in the desired eigenpairs and solution vectors remainmore » small. We consider real symmetric and symplectic eigenvalue problems as well as linear equation systems and Sylvester equations as they appear in configuration interaction and response theory. The nKs method can be implemented in existing electronic structure codes with minor modifications and yields speed-ups of 1.2-1.8 in typical time-dependent Hartree-Fock and density functional applications without accuracy loss. The algorithm can compute entire linear subspaces simultaneously which benefits electronic spectra and force constant calculations requiring many eigenpairs or solution vectors. The nKs approach is related to difference density methods in electronic ground state calculations, and particularly efficient for integral direct computations of exchange-type contractions. By combination with resolution-of-the-identity methods for Coulomb contractions, three- to fivefold speed-ups of hybrid time-dependent density functional excited state and response calculations are achieved.« less
Algorithms of maximum likelihood data clustering with applications

NASA Astrophysics Data System (ADS)

Giada, Lorenzo; Marsili, Matteo

2002-12-01

We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
A new clustering algorithm applicable to multispectral and polarimetric SAR images

NASA Technical Reports Server (NTRS)

Wong, Yiu-Fai; Posner, Edward C.

1993-01-01

We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.
Nonadiabatic holonomic quantum computation in decoherence-free subspaces.

PubMed

Xu, G F; Zhang, J; Tong, D M; Sjöqvist, Erik; Kwek, L C

2012-10-26

Quantum computation that combines the coherence stabilization virtues of decoherence-free subspaces and the fault tolerance of geometric holonomic control is of great practical importance. Some schemes of adiabatic holonomic quantum computation in decoherence-free subspaces have been proposed in the past few years. However, nonadiabatic holonomic quantum computation in decoherence-free subspaces, which avoids a long run-time requirement but with all the robust advantages, remains an open problem. Here, we demonstrate how to realize nonadiabatic holonomic quantum computation in decoherence-free subspaces. By using only three neighboring physical qubits undergoing collective dephasing to encode one logical qubit, we realize a universal set of quantum gates.
Blended particle filters for large-dimensional chaotic dynamical systems

PubMed Central

Majda, Andrew J.; Qi, Di; Sapsis, Themistoklis P.

2014-01-01

A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification. Stringent test cases for filtering involving the 40-dimensional Lorenz 96 model with a 5-dimensional adaptive subspace for nonlinear blended filtering in various turbulent regimes with at least nine positive Lyapunov exponents are used here. These cases demonstrate the high skill of the blended particle filter algorithms in capturing both highly non-Gaussian dynamical features as well as crucial nonlinear statistics for accurate filtering in extreme filtering regimes with sparse infrequent high-quality observations. The formalism developed here is also useful for multiscale filtering of turbulent systems and a simple application is sketched below. PMID:24825886
Discriminant analysis of resting-state functional connectivity patterns on the Grassmann manifold

NASA Astrophysics Data System (ADS)

Fan, Yong; Liu, Yong; Jiang, Tianzi; Liu, Zhening; Hao, Yihui; Liu, Haihong

2010-03-01

The functional networks, extracted from fMRI images using independent component analysis, have been demonstrated informative for distinguishing brain states of cognitive functions and neurological diseases. In this paper, we propose a novel algorithm for discriminant analysis of functional networks encoded by spatial independent components. The functional networks of each individual are used as bases for a linear subspace, referred to as a functional connectivity pattern, which facilitates a comprehensive characterization of temporal signals of fMRI data. The functional connectivity patterns of different individuals are analyzed on the Grassmann manifold by adopting a principal angle based subspace distance. In conjunction with a support vector machine classifier, a forward component selection technique is proposed to select independent components for constructing the most discriminative functional connectivity pattern. The discriminant analysis method has been applied to an fMRI based schizophrenia study with 31 schizophrenia patients and 31 healthy individuals. The experimental results demonstrate that the proposed method not only achieves a promising classification performance for distinguishing schizophrenia patients from healthy controls, but also identifies discriminative functional networks that are informative for schizophrenia diagnosis.
Optimal design of focused experiments and surveys

NASA Astrophysics Data System (ADS)

Curtis, Andrew

1999-10-01

Experiments and surveys are often performed to obtain data that constrain some previously underconstrained model. Often, constraints are most desired in a particular subspace of model space. Experiment design optimization requires that the quality of any particular design can be both quantified and then maximized. This study shows how the quality can be defined such that it depends on the amount of information that is focused in the particular subspace of interest. In addition, algorithms are presented which allow one particular focused quality measure (from the class of focused measures) to be evaluated efficiently. A subclass of focused quality measures is also related to the standard variance and resolution measures from linearized inverse theory. The theory presented here requires that the relationship between model parameters and data can be linearized around a reference model without significant loss of information. Physical and financial constraints define the space of possible experiment designs. Cross-well tomographic examples are presented, plus a strategy for survey design to maximize information about linear combinations of parameters such as bulk modulus, κ =λ+ 2μ/3.
DOA Estimation for Underwater Wideband Weak Targets Based on Coherent Signal Subspace and Compressed Sensing.

PubMed

Li, Jun; Lin, Qiu-Hua; Kang, Chun-Yu; Wang, Kai; Yang, Xiu-Ting

2018-03-18

Direction of arrival (DOA) estimation is the basis for underwater target localization and tracking using towed line array sonar devices. A method of DOA estimation for underwater wideband weak targets based on coherent signal subspace (CSS) processing and compressed sensing (CS) theory is proposed. Under the CSS processing framework, wideband frequency focusing is accompanied by a two-sided correlation transformation, allowing the DOA of underwater wideband targets to be estimated based on the spatial sparsity of the targets and the compressed sensing reconstruction algorithm. Through analysis and processing of simulation data and marine trial data, it is shown that this method can accomplish the DOA estimation of underwater wideband weak targets. Results also show that this method can considerably improve the spatial spectrum of weak target signals, enhancing the ability to detect them. It can solve the problems of low directional resolution and unreliable weak-target detection in traditional beamforming technology. Compared with the conventional minimum variance distortionless response beamformers (MVDR), this method has many advantages, such as higher directional resolution, wider detection range, fewer required snapshots and more accurate detection for weak targets.
Krylov-Subspace Recycling via the POD-Augmented Conjugate-Gradient Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carlberg, Kevin; Forstall, Virginia; Tuminaro, Ray

This paper presents a new Krylov-subspace-recycling method for efficiently solving sequences of linear systems of equations characterized by varying right-hand sides and symmetric-positive-definite matrices. As opposed to typical truncation strategies used in recycling such as deflation, we propose a truncation method inspired by goal-oriented proper orthogonal decomposition (POD) from model reduction. This idea is based on the observation that model reduction aims to compute a low-dimensional subspace that contains an accurate solution; as such, we expect the proposed method to generate a low-dimensional subspace that is well suited for computing solutions that can satisfy inexact tolerances. In particular, we proposemore » specific goal-oriented POD `ingredients' that align the optimality properties of POD with the objective of Krylov-subspace recycling. To compute solutions in the resulting 'augmented' POD subspace, we propose a hybrid direct/iterative three-stage method that leverages 1) the optimal ordering of POD basis vectors, and 2) well-conditioned reduced matrices. Numerical experiments performed on solid-mechanics problems highlight the benefits of the proposed method over existing approaches for Krylov-subspace recycling.« less
Krylov-Subspace Recycling via the POD-Augmented Conjugate-Gradient Method

DOE PAGES

Carlberg, Kevin; Forstall, Virginia; Tuminaro, Ray

2016-01-01

This paper presents a new Krylov-subspace-recycling method for efficiently solving sequences of linear systems of equations characterized by varying right-hand sides and symmetric-positive-definite matrices. As opposed to typical truncation strategies used in recycling such as deflation, we propose a truncation method inspired by goal-oriented proper orthogonal decomposition (POD) from model reduction. This idea is based on the observation that model reduction aims to compute a low-dimensional subspace that contains an accurate solution; as such, we expect the proposed method to generate a low-dimensional subspace that is well suited for computing solutions that can satisfy inexact tolerances. In particular, we proposemore » specific goal-oriented POD `ingredients' that align the optimality properties of POD with the objective of Krylov-subspace recycling. To compute solutions in the resulting 'augmented' POD subspace, we propose a hybrid direct/iterative three-stage method that leverages 1) the optimal ordering of POD basis vectors, and 2) well-conditioned reduced matrices. Numerical experiments performed on solid-mechanics problems highlight the benefits of the proposed method over existing approaches for Krylov-subspace recycling.« less
Spatial cluster detection using dynamic programming.

PubMed

Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

2012-03-25

The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
Spatial cluster detection using dynamic programming

PubMed Central

2012-01-01

Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

PubMed

Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

2009-06-01

The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.

Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance

PubMed Central

Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao

2018-01-01

Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy. PMID:29795600
Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance.

PubMed

Liu, Yongli; Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao

2018-01-01

Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy.
Multi-Optimisation Consensus Clustering

NASA Astrophysics Data System (ADS)

Li, Jian; Swift, Stephen; Liu, Xiaohui

Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Swarm Intelligence in Text Document Clustering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cui, Xiaohui; Potok, Thomas E

2008-01-01

Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less
Effective learning strategies for real-time image-guided adaptive control of multiple-source hyperthermia applicators.

PubMed

Cheng, Kung-Shan; Dewhirst, Mark W; Stauffer, Paul R; Das, Shiva

2010-03-01

This paper investigates overall theoretical requirements for reducing the times required for the iterative learning of a real-time image-guided adaptive control routine for multiple-source heat applicators, as used in hyperthermia and thermal ablative therapy for cancer. Methods for partial reconstruction of the physical system with and without model reduction to find solutions within a clinically practical timeframe were analyzed. A mathematical analysis based on the Fredholm alternative theorem (FAT) was used to compactly analyze the existence and uniqueness of the optimal heating vector under two fundamental situations: (1) noiseless partial reconstruction and (2) noisy partial reconstruction. These results were coupled with a method for further acceleration of the solution using virtual source (VS) model reduction. The matrix approximation theorem (MAT) was used to choose the optimal vectors spanning the reduced-order subspace to reduce the time for system reconstruction and to determine the associated approximation error. Numerical simulations of the adaptive control of hyperthermia using VS were also performed to test the predictions derived from the theoretical analysis. A thigh sarcoma patient model surrounded by a ten-antenna phased-array applicator was retained for this purpose. The impacts of the convective cooling from blood flow and the presence of sudden increase of perfusion in muscle and tumor were also simulated. By FAT, partial system reconstruction directly conducted in the full space of the physical variables such as phases and magnitudes of the heat sources cannot guarantee reconstructing the optimal system to determine the global optimal setting of the heat sources. A remedy for this limitation is to conduct the partial reconstruction within a reduced-order subspace spanned by the first few maximum eigenvectors of the true system matrix. By MAT, this VS subspace is the optimal one when the goal is to maximize the average tumor temperature. When more than 6 sources present, the steps required for a nonlinear learning scheme is theoretically fewer than that of a linear one, however, finite number of iterative corrections is necessary for a single learning step of a nonlinear algorithm. Thus, the actual computational workload for a nonlinear algorithm is not necessarily less than that required by a linear algorithm. Based on the analysis presented herein, obtaining a unique global optimal heating vector for a multiple-source applicator within the constraints of real-time clinical hyperthermia treatments and thermal ablative therapies appears attainable using partial reconstruction with minimum norm least-squares method with supplemental equations. One way to supplement equations is the inclusion of a method of model reduction.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

PubMed

Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

2017-09-01

Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hypercyclic subspaces for Frechet space operators

NASA Astrophysics Data System (ADS)

Petersson, Henrik

2006-07-01

A continuous linear operator is hypercyclic if there is an such that the orbit {Tnx} is dense, and such a vector x is said to be hypercyclic for T. Recent progress show that it is possible to characterize Banach space operators that have a hypercyclic subspace, i.e., an infinite dimensional closed subspace of, except for zero, hypercyclic vectors. The following is known to hold: A Banach space operator T has a hypercyclic subspace if there is a sequence (ni) and an infinite dimensional closed subspace such that T is hereditarily hypercyclic for (ni) and Tni->0 pointwise on E. In this note we extend this result to the setting of Frechet spaces that admit a continuous norm, and study some applications for important function spaces. As an application we also prove that any infinite dimensional separable Frechet space with a continuous norm admits an operator with a hypercyclic subspace.
Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

PubMed

Anandakrishnan, Ramu; Onufriev, Alexey

2008-03-01

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
The cascaded moving k-means and fuzzy c-means clustering algorithms for unsupervised segmentation of malaria images

NASA Astrophysics Data System (ADS)

Abdul-Nasir, Aimi Salihah; Mashor, Mohd Yusoff; Halim, Nurul Hazwani Abd; Mohamed, Zeehaida

2015-05-01

Malaria is a life-threatening parasitic infectious disease that corresponds for nearly one million deaths each year. Due to the requirement of prompt and accurate diagnosis of malaria, the current study has proposed an unsupervised pixel segmentation based on clustering algorithm in order to obtain the fully segmented red blood cells (RBCs) infected with malaria parasites based on the thin blood smear images of P. vivax species. In order to obtain the segmented infected cell, the malaria images are first enhanced by using modified global contrast stretching technique. Then, an unsupervised segmentation technique based on clustering algorithm has been applied on the intensity component of malaria image in order to segment the infected cell from its blood cells background. In this study, cascaded moving k-means (MKM) and fuzzy c-means (FCM) clustering algorithms has been proposed for malaria slide image segmentation. After that, median filter algorithm has been applied to smooth the image as well as to remove any unwanted regions such as small background pixels from the image. Finally, seeded region growing area extraction algorithm has been applied in order to remove large unwanted regions that are still appeared on the image due to their size in which cannot be cleaned by using median filter. The effectiveness of the proposed cascaded MKM and FCM clustering algorithms has been analyzed qualitatively and quantitatively by comparing the proposed cascaded clustering algorithm with MKM and FCM clustering algorithms. Overall, the results indicate that segmentation using the proposed cascaded clustering algorithm has produced the best segmentation performances by achieving acceptable sensitivity as well as high specificity and accuracy values compared to the segmentation results provided by MKM and FCM algorithms.
Towards enhancement of performance of K-means clustering using nature-inspired optimization algorithms.

PubMed

Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

PubMed Central

Deb, Suash; Yang, Xin-She

2014-01-01

Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730
Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

PubMed

Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

2017-01-01

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Characterizing L1-norm best-fit subspaces

NASA Astrophysics Data System (ADS)

Brooks, J. Paul; Dulá, José H.

2017-05-01

Fitting affine objects to data is the basis of many tools and methodologies in statistics, machine learning, and signal processing. The L1 norm is often employed to produce subspaces exhibiting a robustness to outliers and faulty observations. The L1-norm best-fit subspace problem is directly formulated as a nonlinear, nonconvex, and nondifferentiable optimization problem. The case when the subspace is a hyperplane can be solved to global optimality efficiently by solving a series of linear programs. The problem of finding the best-fit line has recently been shown to be NP-hard. We present necessary conditions for optimality for the best-fit subspace problem, and use them to characterize properties of optimal solutions.
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.

PubMed

Wang, Haizhou; Song, Mingzhou

2011-12-01

The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
Inference from clustering with application to gene-expression microarrays.

PubMed

Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

2002-01-01

There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

NASA Astrophysics Data System (ADS)

Frisca, Bustamam, Alhadi; Siswantining, Titin

2017-03-01

Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem

NASA Astrophysics Data System (ADS)

Korayem, L.; Khorsid, M.; Kassem, S. S.

2015-05-01

The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.
Robust uncertainty evaluation for system identification on distributed wireless platforms

NASA Astrophysics Data System (ADS)

Crinière, Antoine; Döhler, Michael; Le Cam, Vincent; Mevel, Laurent

2016-04-01

Health monitoring of civil structures by system identification procedures from automatic control is now accepted as a valid approach. These methods provide frequencies and modeshapes from the structure over time. For a continuous monitoring the excitation of a structure is usually ambient, thus unknown and assumed to be noise. Hence, all estimates from the vibration measurements are realizations of random variables with inherent uncertainty due to (unknown) process and measurement noise and finite data length. The underlying algorithms are usually running under Matlab under the assumption of large memory pool and considerable computational power. Even under these premises, computational and memory usage are heavy and not realistic for being embedded in on-site sensor platforms such as the PEGASE platform. Moreover, the current push for distributed wireless systems calls for algorithmic adaptation for lowering data exchanges and maximizing local processing. Finally, the recent breakthrough in system identification allows us to process both frequency information and its related uncertainty together from one and only one data sequence, at the expense of computational and memory explosion that require even more careful attention than before. The current approach will focus on presenting a system identification procedure called multi-setup subspace identification that allows to process both frequencies and their related variances from a set of interconnected wireless systems with all computation running locally within the limited memory pool of each system before being merged on a host supervisor. Careful attention will be given to data exchanges and I/O satisfying OGC standards, as well as minimizing memory footprints and maximizing computational efficiency. Those systems are built in a way of autonomous operations on field and could be later included in a wide distributed architecture such as the Cloud2SM project. The usefulness of these strategies is illustrated on data from a progressive damage action on a prestressed concrete bridge. References [1] E. Carden and P. Fanning. Vibration based condition monitoring: a review. Structural Health Monitoring, 3(4):355-377, 2004. [2] M. Döhler and L. Mevel. Efficient multi-order uncertainty computation for stochastic subspace identification. Mechanical Systems and Signal Processing, 38(2):346-366, 2013. [3] M.Döhler, L. Mevel. Modular subspace-based system identification from multi-setup measurements. IEEE Transactions on Automatic Control, 57(11):2951-2956, 2012. [4] M. Döhler, X.-B. Lam, and L. Mevel. Uncertainty quantification for modal parameters from stochastic subspace identification on multi-setup measurements. MechanicalSystems and Signal Processing, 36(2):562-581, 2013. [5] A Crinière, J Dumoulin, L Mevel, G Andrade-Barosso, M Simonin. The Cloud2SM Project.European Geosciences Union General Assembly (EGU2015), Apr 2015, Vienne, Austria. 2015.
Subspace Dimensionality: A Tool for Automated QC in Seismic Array Processing

NASA Astrophysics Data System (ADS)

Rowe, C. A.; Stead, R. J.; Begnaud, M. L.

2013-12-01

Because of the great resolving power of seismic arrays, the application of automated processing to array data is critically important in treaty verification work. A significant problem in array analysis is the inclusion of bad sensor channels in the beamforming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by node basis, so the dimensionality of the node traffic is instead monitoried for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. In the established template application, a detector functions in a manner analogous to waveform cross-correlation, applying a statistical test to assess the similarity of the incoming data stream to known templates for events of interest. In our approach, we seek not to detect matching signals, but instead, we examine the signal subspace dimensionality in much the same way that the method addresses node traffic anomalies in large computer systems. Signal anomalies recorded on seismic arrays affect the dimensional structure of the array-wide time-series. We have shown previously that this observation is useful in identifying real seismic events, either by looking at the raw signal or derivatives thereof (entropy, kurtosis), but here we explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements through a jackknifing process to isolate the anomalous channels, so that an automated analysis system might discard them prior to FK analysis and beamforming on events of interest.
Expert-guided evolutionary algorithm for layout design of complex space stations

NASA Astrophysics Data System (ADS)

Qian, Zhiqin; Bi, Zhuming; Cao, Qun; Ju, Weiguo; Teng, Hongfei; Zheng, Yang; Zheng, Siyu

2017-08-01

The layout of a space station should be designed in such a way that different equipment and instruments are placed for the station as a whole to achieve the best overall performance. The station layout design is a typical nondeterministic polynomial problem. In particular, how to manage the design complexity to achieve an acceptable solution within a reasonable timeframe poses a great challenge. In this article, a new evolutionary algorithm has been proposed to meet such a challenge. It is called as the expert-guided evolutionary algorithm with a tree-like structure decomposition (EGEA-TSD). Two innovations in EGEA-TSD are (i) to deal with the design complexity, the entire design space is divided into subspaces with a tree-like structure; it reduces the computation and facilitates experts' involvement in the solving process. (ii) A human-intervention interface is developed to allow experts' involvement in avoiding local optimums and accelerating convergence. To validate the proposed algorithm, the layout design of one-space station is formulated as a multi-disciplinary design problem, the developed algorithm is programmed and executed, and the result is compared with those from other two algorithms; it has illustrated the superior performance of the proposed EGEA-TSD.

Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters

NASA Astrophysics Data System (ADS)

Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.

2018-06-01

We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Spatial and kinematic structure of Monoceros star-forming region

NASA Astrophysics Data System (ADS)

Costado, M. T.; Alfaro, E. J.

2018-05-01

The principal aim of this work is to study the velocity field in the Monoceros star-forming region using the radial velocity data available in the literature, as well as astrometric data from the Gaia first release. This region is a large star-forming complex formed by two associations named Monoceros OB1 and OB2. We have collected radial velocity data for more than 400 stars in the area of 8 × 12 deg2 and distance for more than 200 objects. We apply a clustering analysis in the subspace of the phase space formed by angular coordinates and radial velocity or distance data using the Spectrum of Kinematic Grouping methodology. We found four and three spatial groupings in radial velocity and distance variables, respectively, corresponding to the Local arm, the central clusters forming the associations and the Perseus arm, respectively.
Restoration of multichannel microwave radiometric images

NASA Technical Reports Server (NTRS)

Chin, R. T.; Yeh, C. L.; Olson, W. S.

1983-01-01

A constrained iterative image restoration method is applied to multichannel diffraction-limited imagery. This method is based on the Gerchberg-Papoulis algorithm utilizing incomplete information and partial constraints. The procedure is described using the orthogonal projection operators which project onto two prescribed subspaces iteratively. Some of its properties and limitations are also presented. The selection of appropriate constraints was emphasized in a practical application. Multichannel microwave images, each having different spatial resolution, were restored to a common highest resolution to demonstrate the effectiveness of the method. Both noise-free and noisy images were used in this investigation.
Object recognition of real targets using modelled SAR images

NASA Astrophysics Data System (ADS)

Zherdev, D. A.

2017-12-01

In this work the problem of recognition is studied using SAR images. The algorithm of recognition is based on the computation of conjugation indices with vectors of class. The support subspaces for each class are constructed by exception of the most and the less correlated vectors in a class. In the study we examine the ability of a significant feature vector size reduce that leads to recognition time decrease. The images of targets form the feature vectors that are transformed using pre-trained convolutional neural network (CNN).
Clustering performance comparison using K-means and expectation maximization algorithms.

PubMed

Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

2014-11-14

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

PubMed

Lukashin, A V; Fuchs, R

2001-05-01

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Basic firefly algorithm for document clustering

NASA Astrophysics Data System (ADS)

Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza

2015-12-01

The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.

PubMed

Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg

2017-11-03

In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.
Weighted graph cuts without eigenvectors a multilevel approach.

PubMed

Dhillon, Inderjit S; Guan, Yuqiang; Kulis, Brian

2007-11-01

A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions used in these seemingly different methods--in particular, a general weighted kernel k-means objective is mathematically equivalent to a weighted graph clustering objective. We exploit this equivalence to develop a fast, high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria. This eliminates the need for any eigenvector computation for graph clustering problems, which can be prohibitive for very large graphs. Previous multilevel graph partitioning methods, such as Metis, have suffered from the restriction of equal-sized clusters; our multilevel algorithm removes this restriction by using kernel k-means to optimize weighted graph cuts. Experimental results show that our multilevel algorithm outperforms a state-of-the-art spectral clustering algorithm in terms of speed, memory usage, and quality. We demonstrate that our algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis and gene network analysis.
Model-based clustering for RNA-seq data.

PubMed

Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

2014-01-15

RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Two generalizations of Kohonen clustering

NASA Technical Reports Server (NTRS)

Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

1993-01-01

The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.
Differential evolution algorithm-based kernel parameter selection for Fukunaga-Koontz Transform subspaces construction

NASA Astrophysics Data System (ADS)

Binol, Hamidullah; Bal, Abdullah; Cukur, Huseyin

2015-10-01

The performance of the kernel based techniques depends on the selection of kernel parameters. That's why; suitable parameter selection is an important problem for many kernel based techniques. This article presents a novel technique to learn the kernel parameters in kernel Fukunaga-Koontz Transform based (KFKT) classifier. The proposed approach determines the appropriate values of kernel parameters through optimizing an objective function constructed based on discrimination ability of KFKT. For this purpose we have utilized differential evolution algorithm (DEA). The new technique overcomes some disadvantages such as high time consumption existing in the traditional cross-validation method, and it can be utilized in any type of data. The experiments for target detection applications on the hyperspectral images verify the effectiveness of the proposed method.
An improved initialization center k-means clustering algorithm based on distance and density

NASA Astrophysics Data System (ADS)

Duan, Yanling; Liu, Qun; Xia, Shuyin

2018-04-01

Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Diametrical clustering for identifying anti-correlated gene clusters.

PubMed

Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

2003-09-01

Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.
A novel harmony search-K means hybrid algorithm for clustering gene expression data

PubMed Central

Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu

2013-01-01

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.

PubMed

Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu

2013-01-01

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
m-BIRCH: an online clustering approach for computer vision applications

NASA Astrophysics Data System (ADS)

Madan, Siddharth K.; Dana, Kristin J.

2015-03-01

We adapt a classic online clustering algorithm called Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH), to incrementally cluster large datasets of features commonly used in multimedia and computer vision. We call the adapted version modified-BIRCH (m-BIRCH). The algorithm uses only a fraction of the dataset memory to perform clustering, and updates the clustering decisions when new data comes in. Modifications made in m-BIRCH enable data driven parameter selection and effectively handle varying density regions in the feature space. Data driven parameter selection automatically controls the level of coarseness of the data summarization. Effective handling of varying density regions is necessary to well represent the different density regions in data summarization. We use m-BIRCH to cluster 840K color SIFT descriptors, and 60K outlier corrupted grayscale patches. We use the algorithm to cluster datasets consisting of challenging non-convex clustering patterns. Our implementation of the algorithm provides an useful clustering tool and is made publicly available.
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

PubMed Central

Wang, Zhihao; Yi, Jing

2016-01-01

For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

PubMed Central

Poole, William; Leinonen, Kalle; Shmulevich, Ilya

2017-01-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390
Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

PubMed

Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

2017-02-01

Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.

The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.

DTIC Science & Technology

A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the
Security clustering algorithm based on reputation in hierarchical peer-to-peer network

NASA Astrophysics Data System (ADS)

Chen, Mei; Luo, Xin; Wu, Guowen; Tan, Yang; Kita, Kenji

2013-03-01

For the security problems of the hierarchical P2P network (HPN), the paper presents a security clustering algorithm based on reputation (CABR). In the algorithm, we take the reputation mechanism for ensuring the security of transaction and use cluster for managing the reputation mechanism. In order to improve security, reduce cost of network brought by management of reputation and enhance stability of cluster, we select reputation, the historical average online time, and the network bandwidth as the basic factors of the comprehensive performance of node. Simulation results showed that the proposed algorithm improved the security, reduced the network overhead, and enhanced stability of cluster.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Adaptive low-rank subspace learning with online optimization for robust visual tracking.

PubMed

Liu, Risheng; Wang, Di; Han, Yuzhuo; Fan, Xin; Luo, Zhongxuan

2017-04-01

In recent years, sparse and low-rank models have been widely used to formulate appearance subspace for visual tracking. However, most existing methods only consider the sparsity or low-rankness of the coefficients, which is not sufficient enough for appearance subspace learning on complex video sequences. Moreover, as both the low-rank and the column sparse measures are tightly related to all the samples in the sequences, it is challenging to incrementally solve optimization problems with both nuclear norm and column sparse norm on sequentially obtained video data. To address above limitations, this paper develops a novel low-rank subspace learning with adaptive penalization (LSAP) framework for subspace based robust visual tracking. Different from previous work, which often simply decomposes observations as low-rank features and sparse errors, LSAP simultaneously learns the subspace basis, low-rank coefficients and column sparse errors to formulate appearance subspace. Within LSAP framework, we introduce a Hadamard production based regularization to incorporate rich generative/discriminative structure constraints to adaptively penalize the coefficients for subspace learning. It is shown that such adaptive penalization can significantly improve the robustness of LSAP on severely corrupted dataset. To utilize LSAP for online visual tracking, we also develop an efficient incremental optimization scheme for nuclear norm and column sparse norm minimizations. Experiments on 50 challenging video sequences demonstrate that our tracker outperforms other state-of-the-art methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Determining open cluster membership. A Bayesian framework for quantitative member classification

NASA Astrophysics Data System (ADS)

Stott, Jonathan J.

2018-01-01

Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.
Random Walk Quantum Clustering Algorithm Based on Space

NASA Astrophysics Data System (ADS)

Xiao, Shufen; Dong, Yumin; Ma, Hongyang

2018-01-01

In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Data Clustering

NASA Astrophysics Data System (ADS)

Wagstaff, Kiri L.

2012-03-01

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
An accelerated subspace iteration for eigenvector derivatives

NASA Technical Reports Server (NTRS)

Ting, Tienko

1991-01-01

An accelerated subspace iteration method for calculating eigenvector derivatives has been developed. Factors affecting the effectiveness and the reliability of the subspace iteration are identified, and effective strategies concerning these factors are presented. The method has been implemented, and the results of a demonstration problem are presented.
A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms

PubMed Central

Ponnapalli, Sri Priya; Saunders, Michael A.; Van Loan, Charles F.; Alter, Orly

2011-01-01

The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices , each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients of the matrices , i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk = 1, therefore, define the “common HO GSVD subspace.” We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified. PMID:22216090
Contributions to "k"-Means Clustering and Regression via Classification Algorithms

ERIC Educational Resources Information Center

Salman, Raied

2012-01-01

The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
Multi scales based sparse matrix spectral clustering image segmentation

NASA Astrophysics Data System (ADS)

Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin

2018-04-01

In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
An AK-LDMeans algorithm based on image clustering

NASA Astrophysics Data System (ADS)

Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

2018-03-01

Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering.

PubMed

Bi, Xia-An; Zhao, Junxia

2017-01-01

With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.

PubMed

Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin

2018-05-03

Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.
A fuzzy clustering algorithm to detect planar and quadric shapes

NASA Technical Reports Server (NTRS)

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

PubMed Central

Ying Wah, Teh

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A fast density-based clustering algorithm for real-time Internet of Things stream.

PubMed

Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

2014-01-01

Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
System identification using Nuclear Norm & Tabu Search optimization

NASA Astrophysics Data System (ADS)

Ahmed, Asif A.; Schoen, Marco P.; Bosworth, Ken W.

2018-01-01

In recent years, subspace System Identification (SI) algorithms have seen increased research, stemming from advanced minimization methods being applied to the Nuclear Norm (NN) approach in system identification. These minimization algorithms are based on hard computing methodologies. To the authors’ knowledge, as of now, there has been no work reported that utilizes soft computing algorithms to address the minimization problem within the nuclear norm SI framework. A linear, time-invariant, discrete time system is used in this work as the basic model for characterizing a dynamical system to be identified. The main objective is to extract a mathematical model from collected experimental input-output data. Hankel matrices are constructed from experimental data, and the extended observability matrix is employed to define an estimated output of the system. This estimated output and the actual - measured - output are utilized to construct a minimization problem. An embedded rank measure assures minimum state realization outcomes. Current NN-SI algorithms employ hard computing algorithms for minimization. In this work, we propose a simple Tabu Search (TS) algorithm for minimization. TS algorithm based SI is compared with the iterative Alternating Direction Method of Multipliers (ADMM) line search optimization based NN-SI. For comparison, several different benchmark system identification problems are solved by both approaches. Results show improved performance of the proposed SI-TS algorithm compared to the NN-SI ADMM algorithm.
Discrete-State Stochastic Models of Calcium-Regulated Calcium Influx and Subspace Dynamics Are Not Well-Approximated by ODEs That Neglect Concentration Fluctuations

PubMed Central

Weinberg, Seth H.; Smith, Gregory D.

2012-01-01

Cardiac myocyte calcium signaling is often modeled using deterministic ordinary differential equations (ODEs) and mass-action kinetics. However, spatially restricted “domains” associated with calcium influx are small enough (e.g., 10−17 liters) that local signaling may involve 1–100 calcium ions. Is it appropriate to model the dynamics of subspace calcium using deterministic ODEs or, alternatively, do we require stochastic descriptions that account for the fundamentally discrete nature of these local calcium signals? To address this question, we constructed a minimal Markov model of a calcium-regulated calcium channel and associated subspace. We compared the expected value of fluctuating subspace calcium concentration (a result that accounts for the small subspace volume) with the corresponding deterministic model (an approximation that assumes large system size). When subspace calcium did not regulate calcium influx, the deterministic and stochastic descriptions agreed. However, when calcium binding altered channel activity in the model, the continuous deterministic description often deviated significantly from the discrete stochastic model, unless the subspace volume is unrealistically large and/or the kinetics of the calcium binding are sufficiently fast. This principle was also demonstrated using a physiologically realistic model of calmodulin regulation of L-type calcium channels introduced by Yue and coworkers. PMID:23509597

Hyperspectral image compressing using wavelet-based method

NASA Astrophysics Data System (ADS)

Yu, Hui; Zhang, Zhi-jie; Lei, Bo; Wang, Chen-sheng

2017-10-01

Hyperspectral imaging sensors can acquire images in hundreds of continuous narrow spectral bands. Therefore each object presented in the image can be identified from their spectral response. However, such kind of imaging brings a huge amount of data, which requires transmission, processing, and storage resources for both airborne and space borne imaging. Due to the high volume of hyperspectral image data, the exploration of compression strategies has received a lot of attention in recent years. Compression of hyperspectral data cubes is an effective solution for these problems. Lossless compression of the hyperspectral data usually results in low compression ratio, which may not meet the available resources; on the other hand, lossy compression may give the desired ratio, but with a significant degradation effect on object identification performance of the hyperspectral data. Moreover, most hyperspectral data compression techniques exploits the similarities in spectral dimensions; which requires bands reordering or regrouping, to make use of the spectral redundancy. In this paper, we explored the spectral cross correlation between different bands, and proposed an adaptive band selection method to obtain the spectral bands which contain most of the information of the acquired hyperspectral data cube. The proposed method mainly consist three steps: First, the algorithm decomposes the original hyperspectral imagery into a series of subspaces based on the hyper correlation matrix of the hyperspectral images between different bands. And then the Wavelet-based algorithm is applied to the each subspaces. At last the PCA method is applied to the wavelet coefficients to produce the chosen number of components. The performance of the proposed method was tested by using ISODATA classification method.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.

PubMed

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

PubMed Central

Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi

2015-01-01

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
A clustering algorithm for determining community structure in complex networks

NASA Astrophysics Data System (ADS)

Jin, Hong; Yu, Wei; Li, ShiJun

2018-02-01

Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
Collaborative filtering recommendation model based on fuzzy clustering algorithm

NASA Astrophysics Data System (ADS)

Yang, Ye; Zhang, Yunhua

2018-05-01

As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.
A parameter estimation algorithm for LFM/BPSK hybrid modulated signal intercepted by Nyquist folding receiver

NASA Astrophysics Data System (ADS)

Qiu, Zhaoyang; Wang, Pei; Zhu, Jun; Tang, Bin

2016-12-01

Nyquist folding receiver (NYFR) is a novel ultra-wideband receiver architecture which can realize wideband receiving with a small amount of equipment. Linear frequency modulated/binary phase shift keying (LFM/BPSK) hybrid modulated signal is a novel kind of low probability interception signal with wide bandwidth. The NYFR is an effective architecture to intercept the LFM/BPSK signal and the LFM/BPSK signal intercepted by the NYFR will add the local oscillator modulation. A parameter estimation algorithm for the NYFR output signal is proposed. According to the NYFR prior information, the chirp singular value ratio spectrum is proposed to estimate the chirp rate. Then, based on the output self-characteristic, matching component function is designed to estimate Nyquist zone (NZ) index. Finally, matching code and subspace method are employed to estimate the phase change points and code length. Compared with the existing methods, the proposed algorithm has a better performance. It also has no need to construct a multi-channel structure, which means the computational complexity for the NZ index estimation is small. The simulation results demonstrate the efficacy of the proposed algorithm.
ROMUSE 2.0 User Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khuwaileh, Bassam; Turinsky, Paul; Williams, Brian J.

2016-10-04

ROMUSE (Reduced Order Modeling Based Uncertainty/Sensitivity Estimator) is an effort within the Consortium for Advanced Simulation of Light water reactors (CASL) to provide an analysis tool to be used in conjunction with reactor core simulators, especially the Virtual Environment for Reactor Applications (VERA). ROMUSE is written in C++ and is currently capable of performing various types of parameters perturbations, uncertainty quantification, surrogate models construction and subspace analysis. Version 2.0 has the capability to interface with DAKOTA which gives ROMUSE access to the various algorithms implemented within DAKOTA. ROMUSE is mainly designed to interface with VERA and the Comprehensive Modeling andmore » Simulation Suite for Nuclear Safety Analysis and Design (SCALE) [1,2,3], however, ROMUSE can interface with any general model (e.g. python and matlab) with Input/Output (I/O) format that follows the Hierarchical Data Format 5 (HDF5). In this brief user manual, the use of ROMUSE will be overviewed and example problems will be presented and briefly discussed. The algorithms provided here range from algorithms inspired by those discussed in Ref.[4] to nuclear-specific algorithms discussed in Ref. [3].« less
A new clustering strategy

NASA Astrophysics Data System (ADS)

Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing

2007-04-01

On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Measuring Constraint-Set Utility for Partitional Clustering Algorithms

NASA Technical Reports Server (NTRS)

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

2006-01-01

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

PubMed

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering

PubMed Central

Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

2016-01-01

Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
A comparative study of DIGNET, average, complete, single hierarchical and k-means clustering algorithms in 2D face image recognition

NASA Astrophysics Data System (ADS)

Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.

2014-06-01

The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
An Adaptive Clustering Approach Based on Minimum Travel Route Planning for Wireless Sensor Networks with a Mobile Sink.

PubMed

Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin

2017-04-26

In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate.
Time reversal acoustics for small targets using decomposition of the time reversal operator

NASA Astrophysics Data System (ADS)

Simko, Peter C.

The method of time reversal acoustics has been the focus of considerable interest over the last twenty years. Time reversal imaging methods have made consistent progress as effective methods for signal processing since the initial demonstration that physical time reversal methods can be used to form convergent wave fields on a localized target, even under conditions of severe multipathing. Computational time reversal methods rely on the properties of the so-called 'time reversal operator' in order to extract information about the target medium. Applications for which time reversal imaging have previously been explored include medical imaging, non-destructive evaluation, and mine detection. Emphasis in this paper will fall on two topics within the general field of computational time reversal imaging. First, we will examine previous work on developing a time reversal imaging algorithm based on the MUltiple SIgnal Classification (MUSIC) algorithm. MUSIC, though computationally very intensive, has demonstrated early promise in simulations using array-based methods applicable to true volumetric (three-dimensional) imaging. We will provide a simple algorithm through which the rank of the time reversal operator subspaces can be properly quantified so that the rank of the associated null subspace can be accurately estimated near the central pulse wavelength in broadband imaging. Second, we will focus on the scattering from small acoustically rigid two dimensional cylindrical targets of elliptical cross section. Analysis of the time reversal operator eigenmodes has been well-studied for symmetric response matrices associated with symmetric systems of scattering targets. We will expand these previous results to include more general scattering systems leading to asymmetric response matrices, for which the analytical complexity increases but the physical interpretation of the time reversal operator remains unchanged. For asymmetric responses, the qualitative properties of the time reversal operator eigenmodes remain consistent with those obtained from the more tightly constrained systems.
Conjunctive patches subspace learning with side information for collaborative image retrieval.

PubMed

Zhang, Lining; Wang, Lipo; Lin, Weisi

2012-08-01

Content-Based Image Retrieval (CBIR) has attracted substantial attention during the past few years for its potential practical applications to image management. A variety of Relevance Feedback (RF) schemes have been designed to bridge the semantic gap between the low-level visual features and the high-level semantic concepts for an image retrieval task. Various Collaborative Image Retrieval (CIR) schemes aim to utilize the user historical feedback log data with similar and dissimilar pairwise constraints to improve the performance of a CBIR system. However, existing subspace learning approaches with explicit label information cannot be applied for a CIR task, although the subspace learning techniques play a key role in various computer vision tasks, e.g., face recognition and image classification. In this paper, we propose a novel subspace learning framework, i.e., Conjunctive Patches Subspace Learning (CPSL) with side information, for learning an effective semantic subspace by exploiting the user historical feedback log data for a CIR task. The CPSL can effectively integrate the discriminative information of labeled log images, the geometrical information of labeled log images and the weakly similar information of unlabeled images together to learn a reliable subspace. We formally formulate this problem into a constrained optimization problem and then present a new subspace learning technique to exploit the user historical feedback log data. Extensive experiments on both synthetic data sets and a real-world image database demonstrate the effectiveness of the proposed scheme in improving the performance of a CBIR system by exploiting the user historical feedback log data.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs

NASA Astrophysics Data System (ADS)

Takahashi, Daisuke

2003-05-01

In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Approximate kernel competitive learning.

PubMed

Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang

2015-03-01

Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Comparing, optimizing, and benchmarking quantum-control algorithms in a unifying programming framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Machnes, S.; Institute for Theoretical Physics, University of Ulm, D-89069 Ulm; Sander, U.

2011-08-15

For paving the way to novel applications in quantum simulation, computation, and technology, increasingly large quantum systems have to be steered with high precision. It is a typical task amenable to numerical optimal control to turn the time course of pulses, i.e., piecewise constant control amplitudes, iteratively into an optimized shape. Here, we present a comparative study of optimal-control algorithms for a wide range of finite-dimensional applications. We focus on the most commonly used algorithms: GRAPE methods which update all controls concurrently, and Krotov-type methods which do so sequentially. Guidelines for their use are given and open research questions aremore » pointed out. Moreover, we introduce a unifying algorithmic framework, DYNAMO (dynamic optimization platform), designed to provide the quantum-technology community with a convenient matlab-based tool set for optimal control. In addition, it gives researchers in optimal-control techniques a framework for benchmarking and comparing newly proposed algorithms with the state of the art. It allows a mix-and-match approach with various types of gradients, update and step-size methods as well as subspace choices. Open-source code including examples is made available at http://qlib.info.« less

Progressive Visual Analytics: User-Driven Visual Exploration of In-Progress Analytics.

PubMed

Stolper, Charles D; Perer, Adam; Gotz, David

2014-12-01

As datasets grow and analytic algorithms become more complex, the typical workflow of analysts launching an analytic, waiting for it to complete, inspecting the results, and then re-Iaunching the computation with adjusted parameters is not realistic for many real-world tasks. This paper presents an alternative workflow, progressive visual analytics, which enables an analyst to inspect partial results of an algorithm as they become available and interact with the algorithm to prioritize subspaces of interest. Progressive visual analytics depends on adapting analytical algorithms to produce meaningful partial results and enable analyst intervention without sacrificing computational speed. The paradigm also depends on adapting information visualization techniques to incorporate the constantly refining results without overwhelming analysts and provide interactions to support an analyst directing the analytic. The contributions of this paper include: a description of the progressive visual analytics paradigm; design goals for both the algorithms and visualizations in progressive visual analytics systems; an example progressive visual analytics system (Progressive Insights) for analyzing common patterns in a collection of event sequences; and an evaluation of Progressive Insights and the progressive visual analytics paradigm by clinical researchers analyzing electronic medical records.
Polynomial Similarity Transformation Theory: A smooth interpolation between coupled cluster doubles and projected BCS applied to the reduced BCS Hamiltonian

DOE Office of Scientific and Technical Information (OSTI.GOV)

Degroote, M.; Henderson, T. M.; Zhao, J.

We present a similarity transformation theory based on a polynomial form of a particle-hole pair excitation operator. In the weakly correlated limit, this polynomial becomes an exponential, leading to coupled cluster doubles. In the opposite strongly correlated limit, the polynomial becomes an extended Bessel expansion and yields the projected BCS wavefunction. In between, we interpolate using a single parameter. The e ective Hamiltonian is non-hermitian and this Polynomial Similarity Transformation Theory follows the philosophy of traditional coupled cluster, left projecting the transformed Hamiltonian onto subspaces of the Hilbert space in which the wave function variance is forced to be zero.more » Similarly, the interpolation parameter is obtained through minimizing the next residual in the projective hierarchy. We rationalize and demonstrate how and why coupled cluster doubles is ill suited to the strongly correlated limit whereas the Bessel expansion remains well behaved. The model provides accurate wave functions with energy errors that in its best variant are smaller than 1% across all interaction stengths. The numerical cost is polynomial in system size and the theory can be straightforwardly applied to any realistic Hamiltonian.« less
Performance analysis of unsupervised optimal fuzzy clustering algorithm for MRI brain tumor segmentation.

PubMed

Blessy, S A Praylin Selva; Sulochana, C Helen

2015-01-01

Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
Adaptive density trajectory cluster based on time and space distance

NASA Astrophysics Data System (ADS)

Liu, Fagui; Zhang, Zhijie

2017-10-01

There are some hotspot problems remaining in trajectory cluster for discovering mobile behavior regularity, such as the computation of distance between sub trajectories, the setting of parameter values in cluster algorithm and the uncertainty/boundary problem of data set. As a result, based on the time and space, this paper tries to define the calculation method of distance between sub trajectories. The significance of distance calculation for sub trajectories is to clearly reveal the differences in moving trajectories and to promote the accuracy of cluster algorithm. Besides, a novel adaptive density trajectory cluster algorithm is proposed, in which cluster radius is computed through using the density of data distribution. In addition, cluster centers and number are selected by a certain strategy automatically, and uncertainty/boundary problem of data set is solved by designed weighted rough c-means. Experimental results demonstrate that the proposed algorithm can perform the fuzzy trajectory cluster effectively on the basis of the time and space distance, and obtain the optimal cluster centers and rich cluster results information adaptably for excavating the features of mobile behavior in mobile and sociology network.
An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

PubMed

Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

2013-05-01

Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Subspace in Linear Algebra: Investigating Students' Concept Images and Interactions with the Formal Definition

ERIC Educational Resources Information Center

Wawro, Megan; Sweeney, George F.; Rabin, Jeffrey M.

2011-01-01

This paper reports on a study investigating students' ways of conceptualizing key ideas in linear algebra, with the particular results presented here focusing on student interactions with the notion of subspace. In interviews conducted with eight undergraduates, we found students' initial descriptions of subspace often varied substantially from…
Normal and abnormal tissue identification system and method for medical images such as digital mammograms

NASA Technical Reports Server (NTRS)

Heine, John J. (Inventor); Clarke, Laurence P. (Inventor); Deans, Stanley R. (Inventor); Stauduhar, Richard Paul (Inventor); Cullers, David Kent (Inventor)

2001-01-01

A system and method for analyzing a medical image to determine whether an abnormality is present, for example, in digital mammograms, includes the application of a wavelet expansion to a raw image to obtain subspace images of varying resolution. At least one subspace image is selected that has a resolution commensurate with a desired predetermined detection resolution range. A functional form of a probability distribution function is determined for each selected subspace image, and an optimal statistical normal image region test is determined for each selected subspace image. A threshold level for the probability distribution function is established from the optimal statistical normal image region test for each selected subspace image. A region size comprising at least one sector is defined, and an output image is created that includes a combination of all regions for each selected subspace image. Each region has a first value when the region intensity level is above the threshold and a second value when the region intensity level is below the threshold. This permits the localization of a potential abnormality within the image.
Active Subspaces for Wind Plant Surrogate Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

King, Ryan N; Quick, Julian; Dykes, Katherine L

Understanding the uncertainty in wind plant performance is crucial to their cost-effective design and operation. However, conventional approaches to uncertainty quantification (UQ), such as Monte Carlo techniques or surrogate modeling, are often computationally intractable for utility-scale wind plants because of poor congergence rates or the curse of dimensionality. In this paper we demonstrate that wind plant power uncertainty can be well represented with a low-dimensional active subspace, thereby achieving a significant reduction in the dimension of the surrogate modeling problem. We apply the active sub-spaces technique to UQ of plant power output with respect to uncertainty in turbine axial inductionmore » factors, and find a single active subspace direction dominates the sensitivity in power output. When this single active subspace direction is used to construct a quadratic surrogate model, the number of model unknowns can be reduced by up to 3 orders of magnitude without compromising performance on unseen test data. We conclude that the dimension reduction achieved with active subspaces makes surrogate-based UQ approaches tractable for utility-scale wind plants.« less
Interpretation of the MEG-MUSIC scan in biomagnetic source localization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mosher, J.C.; Lewis, P.S.; Leahy, R.M.

1993-09-01

MEG-Music is a new approach to MEG source localization. MEG-Music is based on a spatio-temporal source model in which the observed biomagnetic fields are generated by a small number of current dipole sources with fixed positions/orientations and varying strengths. From the spatial covariance matrix of the observed fields, a signal subspace can be identified. The rank of this subspace is equal to the number of elemental sources present. This signal sub-space is used in a projection metric that scans the three dimensional head volume. Given a perfect signal subspace estimate and a perfect forward model, the metric will peak atmore » unity at each dipole location. In practice, the signal subspace estimate is contaminated by noise, which in turn yields MUSIC peaks which are less than unity. Previously we examined the lower bounds on localization error, independent of the choice of localization procedure. In this paper, we analyzed the effects of noise and temporal coherence on the signal subspace estimate and the resulting effects on the MEG-MUSIC peaks.« less
Degree-of-Freedom Strengthened Cascade Array for DOD-DOA Estimation in MIMO Array Systems.

PubMed

Yao, Bobin; Dong, Zhi; Zhang, Weile; Wang, Wei; Wu, Qisheng

2018-05-14

In spatial spectrum estimation, difference co-array can provide extra degrees-of-freedom (DOFs) for promoting parameter identifiability and parameter estimation accuracy. For the sake of acquiring as more DOFs as possible with a given number of physical sensors, we herein design a novel sensor array geometry named cascade array. This structure is generated by systematically connecting a uniform linear array (ULA) and a non-uniform linear array, and can provide more DOFs than some exist array structures but less than the upper-bound indicated by minimum redundant array (MRA). We further apply this cascade array into multiple input multiple output (MIMO) array systems, and propose a novel joint direction of departure (DOD) and direction of arrival (DOA) estimation algorithm, which is based on a reduced-dimensional weighted subspace fitting technique. The algorithm is angle auto-paired and computationally efficient. Theoretical analysis and numerical simulations prove the advantages and effectiveness of the proposed array structure and the related algorithm.
Binary tree eigen solver in finite element analysis

NASA Technical Reports Server (NTRS)

Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.

1993-01-01

This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

PubMed

Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

2015-07-05

We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Measure Projection Analysis: A Probabilistic Approach to EEG Source Comparison and Multi-Subject Inference

PubMed Central

Bigdely-Shamlo, Nima; Mullen, Tim; Kreutz-Delgado, Kenneth; Makeig, Scott

2013-01-01

A crucial question for the analysis of multi-subject and/or multi-session electroencephalographic (EEG) data is how to combine information across multiple recordings from different subjects and/or sessions, each associated with its own set of source processes and scalp projections. Here we introduce a novel statistical method for characterizing the spatial consistency of EEG dynamics across a set of data records. Measure Projection Analysis (MPA) first finds voxels in a common template brain space at which a given dynamic measure is consistent across nearby source locations, then computes local-mean EEG measure values for this voxel subspace using a statistical model of source localization error and between-subject anatomical variation. Finally, clustering the mean measure voxel values in this locally consistent brain subspace finds brain spatial domains exhibiting distinguishable measure features and provides 3-D maps plus statistical significance estimates for each EEG measure of interest. Applied to sufficient high-quality data, the scalp projections of many maximally independent component (IC) processes contributing to recorded high-density EEG data closely match the projection of a single equivalent dipole located in or near brain cortex. We demonstrate the application of MPA to a multi-subject EEG study decomposed using independent component analysis (ICA), compare the results to k-means IC clustering in EEGLAB (sccn.ucsd.edu/eeglab), and use surrogate data to test MPA robustness. A Measure Projection Toolbox (MPT) plug-in for EEGLAB is available for download (sccn.ucsd.edu/wiki/MPT). Together, MPA and ICA allow use of EEG as a 3-D cortical imaging modality with near-cm scale spatial resolution. PMID:23370059
A Rapid Convergent Low Complexity Interference Alignment Algorithm for Wireless Sensor Networks.

PubMed

Jiang, Lihui; Wu, Zhilu; Ren, Guanghui; Wang, Gangyi; Zhao, Nan

2015-07-29

Interference alignment (IA) is a novel technique that can effectively eliminate the interference and approach the sum capacity of wireless sensor networks (WSNs) when the signal-to-noise ratio (SNR) is high, by casting the desired signal and interference into different signal subspaces. The traditional alternating minimization interference leakage (AMIL) algorithm for IA shows good performance in high SNR regimes, however, the complexity of the AMIL algorithm increases dramatically as the number of users and antennas increases, posing limits to its applications in the practical systems. In this paper, a novel IA algorithm, called directional quartic optimal (DQO) algorithm, is proposed to minimize the interference leakage with rapid convergence and low complexity. The properties of the AMIL algorithm are investigated, and it is discovered that the difference between the two consecutive iteration results of the AMIL algorithm will approximately point to the convergence solution when the precoding and decoding matrices obtained from the intermediate iterations are sufficiently close to their convergence values. Based on this important property, the proposed DQO algorithm employs the line search procedure so that it can converge to the destination directly. In addition, the optimal step size can be determined analytically by optimizing a quartic function. Numerical results show that the proposed DQO algorithm can suppress the interference leakage more rapidly than the traditional AMIL algorithm, and can achieve the same level of sum rate as that of AMIL algorithm with far less iterations and execution time.
Model-based Clustering of High-Dimensional Data in Astrophysics

NASA Astrophysics Data System (ADS)

Bouveyron, C.

2016-05-01

The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the "small n / large p" situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages.
Local Subspace Classifier with Transform-Invariance for Image Classification

NASA Astrophysics Data System (ADS)

Hotta, Seiji

A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering

NASA Technical Reports Server (NTRS)

Rizvi, Farheen

2013-01-01

The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging

NASA Astrophysics Data System (ADS)

Liu, Feng; Ye, Chengcheng; Zhu, Erzhou

2017-09-01

Due to the advent of big data, data mining technology has attracted more and more attentions. As an important data analysis method, grid clustering algorithm is fast but with relatively lower accuracy. This paper presents an improved clustering algorithm combined with grid and density parameters. The algorithm first divides the data space into the valid meshes and invalid meshes through grid parameters. Secondly, from the starting point located at the first point of the diagonal of the grids, the algorithm takes the direction of “horizontal right, vertical down” to merge the valid meshes. Furthermore, by the boundary grid processing, the invalid grids are searched and merged when the adjacent left, above, and diagonal-direction grids are all the valid ones. By doing this, the accuracy of clustering is improved. The experimental results have shown that the proposed algorithm is accuracy and relatively faster when compared with some popularly used algorithms.

The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

NASA Astrophysics Data System (ADS)

Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

2018-04-01

Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Robust MST-Based Clustering Algorithm.

PubMed

Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

2018-06-01

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
A Clustering Algorithm for Ecological Stream Segment Identification from Spatially Extensive Digital Databases

NASA Astrophysics Data System (ADS)

Brenden, T. O.; Clark, R. D.; Wiley, M. J.; Seelbach, P. W.; Wang, L.

2005-05-01

Remote sensing and geographic information systems have made it possible to attribute variables for streams at increasingly detailed resolutions (e.g., individual river reaches). Nevertheless, management decisions still must be made at large scales because land and stream managers typically lack sufficient resources to manage on an individual reach basis. Managers thus require a method for identifying stream management units that are ecologically similar and that can be expected to respond similarly to management decisions. We have developed a spatially-constrained clustering algorithm that can merge neighboring river reaches with similar ecological characteristics into larger management units. The clustering algorithm is based on the Cluster Affinity Search Technique (CAST), which was developed for clustering gene expression data. Inputs to the clustering algorithm are the neighbor relationships of the reaches that comprise the digital river network, the ecological attributes of the reaches, and an affinity value, which identifies the minimum similarity for merging river reaches. In this presentation, we describe the clustering algorithm in greater detail and contrast its use with other methods (expert opinion, classification approach, regular clustering) for identifying management units using several Michigan watersheds as a backdrop.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.

PubMed

Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
The trust-region self-consistent field method in Kohn-Sham density-functional theory.

PubMed

Thøgersen, Lea; Olsen, Jeppe; Köhn, Andreas; Jørgensen, Poul; Sałek, Paweł; Helgaker, Trygve

2005-08-15

The trust-region self-consistent field (TRSCF) method is extended to the optimization of the Kohn-Sham energy. In the TRSCF method, both the Roothaan-Hall step and the density-subspace minimization step are replaced by trust-region optimizations of local approximations to the Kohn-Sham energy, leading to a controlled, monotonic convergence towards the optimized energy. Previously the TRSCF method has been developed for optimization of the Hartree-Fock energy, which is a simple quadratic function in the density matrix. However, since the Kohn-Sham energy is a nonquadratic function of the density matrix, the local energy functions must be generalized for use with the Kohn-Sham model. Such a generalization, which contains the Hartree-Fock model as a special case, is presented here. For comparison, a rederivation of the popular direct inversion in the iterative subspace (DIIS) algorithm is performed, demonstrating that the DIIS method may be viewed as a quasi-Newton method, explaining its fast local convergence. In the global region the convergence behavior of DIIS is less predictable. The related energy DIIS technique is also discussed and shown to be inappropriate for the optimization of the Kohn-Sham energy.
Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

PubMed

Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

2013-01-01

DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.
DOA Estimation for Underwater Wideband Weak Targets Based on Coherent Signal Subspace and Compressed Sensing

PubMed Central

2018-01-01

Direction of arrival (DOA) estimation is the basis for underwater target localization and tracking using towed line array sonar devices. A method of DOA estimation for underwater wideband weak targets based on coherent signal subspace (CSS) processing and compressed sensing (CS) theory is proposed. Under the CSS processing framework, wideband frequency focusing is accompanied by a two-sided correlation transformation, allowing the DOA of underwater wideband targets to be estimated based on the spatial sparsity of the targets and the compressed sensing reconstruction algorithm. Through analysis and processing of simulation data and marine trial data, it is shown that this method can accomplish the DOA estimation of underwater wideband weak targets. Results also show that this method can considerably improve the spatial spectrum of weak target signals, enhancing the ability to detect them. It can solve the problems of low directional resolution and unreliable weak-target detection in traditional beamforming technology. Compared with the conventional minimum variance distortionless response beamformers (MVDR), this method has many advantages, such as higher directional resolution, wider detection range, fewer required snapshots and more accurate detection for weak targets. PMID:29562642
Discriminant projective non-negative matrix factorization.

PubMed

Guan, Naiyang; Zhang, Xiang; Luo, Zhigang; Tao, Dacheng; Yang, Xuejun

2013-01-01

Projective non-negative matrix factorization (PNMF) projects high-dimensional non-negative examples X onto a lower-dimensional subspace spanned by a non-negative basis W and considers W(T) X as their coefficients, i.e., X≈WW(T) X. Since PNMF learns the natural parts-based representation Wof X, it has been widely used in many fields such as pattern recognition and computer vision. However, PNMF does not perform well in classification tasks because it completely ignores the label information of the dataset. This paper proposes a Discriminant PNMF method (DPNMF) to overcome this deficiency. In particular, DPNMF exploits Fisher's criterion to PNMF for utilizing the label information. Similar to PNMF, DPNMF learns a single non-negative basis matrix and needs less computational burden than NMF. In contrast to PNMF, DPNMF maximizes the distance between centers of any two classes of examples meanwhile minimizes the distance between any two examples of the same class in the lower-dimensional subspace and thus has more discriminant power. We develop a multiplicative update rule to solve DPNMF and prove its convergence. Experimental results on four popular face image datasets confirm its effectiveness comparing with the representative NMF and PNMF algorithms.
Sparse electrocardiogram signals recovery based on solving a row echelon-like form of system.

PubMed

Cai, Pingmei; Wang, Guinan; Yu, Shiwei; Zhang, Hongjuan; Ding, Shuxue; Wu, Zikai

2016-02-01

The study of biology and medicine in a noise environment is an evolving direction in biological data analysis. Among these studies, analysis of electrocardiogram (ECG) signals in a noise environment is a challenging direction in personalized medicine. Due to its periodic characteristic, ECG signal can be roughly regarded as sparse biomedical signals. This study proposes a two-stage recovery algorithm for sparse biomedical signals in time domain. In the first stage, the concentration subspaces are found in advance. Then by exploiting these subspaces, the mixing matrix is estimated accurately. In the second stage, based on the number of active sources at each time point, the time points are divided into different layers. Next, by constructing some transformation matrices, these time points form a row echelon-like system. After that, the sources at each layer can be solved out explicitly by corresponding matrix operations. It is noting that all these operations are conducted under a weak sparse condition that the number of active sources is less than the number of observations. Experimental results show that the proposed method has a better performance for sparse ECG signal recovery problem.
An algorithm for separation of mixed sparse and Gaussian sources

PubMed Central

Akkalkotkar, Ameya

2017-01-01

Independent component analysis (ICA) is a ubiquitous method for decomposing complex signal mixtures into a small set of statistically independent source signals. However, in cases in which the signal mixture consists of both nongaussian and Gaussian sources, the Gaussian sources will not be recoverable by ICA and will pollute estimates of the nongaussian sources. Therefore, it is desirable to have methods for mixed ICA/PCA which can separate mixtures of Gaussian and nongaussian sources. For mixtures of purely Gaussian sources, principal component analysis (PCA) can provide a basis for the Gaussian subspace. We introduce a new method for mixed ICA/PCA which we call Mixed ICA/PCA via Reproducibility Stability (MIPReSt). Our method uses a repeated estimations technique to rank sources by reproducibility, combined with decomposition of multiple subsamplings of the original data matrix. These multiple decompositions allow us to assess component stability as the size of the data matrix changes, which can be used to determinine the dimension of the nongaussian subspace in a mixture. We demonstrate the utility of MIPReSt for signal mixtures consisting of simulated sources and real-word (speech) sources, as well as mixture of unknown composition. PMID:28414814
An algorithm for separation of mixed sparse and Gaussian sources.

PubMed

Akkalkotkar, Ameya; Brown, Kevin Scott

2017-01-01

Independent component analysis (ICA) is a ubiquitous method for decomposing complex signal mixtures into a small set of statistically independent source signals. However, in cases in which the signal mixture consists of both nongaussian and Gaussian sources, the Gaussian sources will not be recoverable by ICA and will pollute estimates of the nongaussian sources. Therefore, it is desirable to have methods for mixed ICA/PCA which can separate mixtures of Gaussian and nongaussian sources. For mixtures of purely Gaussian sources, principal component analysis (PCA) can provide a basis for the Gaussian subspace. We introduce a new method for mixed ICA/PCA which we call Mixed ICA/PCA via Reproducibility Stability (MIPReSt). Our method uses a repeated estimations technique to rank sources by reproducibility, combined with decomposition of multiple subsamplings of the original data matrix. These multiple decompositions allow us to assess component stability as the size of the data matrix changes, which can be used to determinine the dimension of the nongaussian subspace in a mixture. We demonstrate the utility of MIPReSt for signal mixtures consisting of simulated sources and real-word (speech) sources, as well as mixture of unknown composition.
Discriminant Projective Non-Negative Matrix Factorization

PubMed Central

Guan, Naiyang; Zhang, Xiang; Luo, Zhigang; Tao, Dacheng; Yang, Xuejun

2013-01-01

Projective non-negative matrix factorization (PNMF) projects high-dimensional non-negative examples X onto a lower-dimensional subspace spanned by a non-negative basis W and considers WT X as their coefficients, i.e., X≈WWT X. Since PNMF learns the natural parts-based representation Wof X, it has been widely used in many fields such as pattern recognition and computer vision. However, PNMF does not perform well in classification tasks because it completely ignores the label information of the dataset. This paper proposes a Discriminant PNMF method (DPNMF) to overcome this deficiency. In particular, DPNMF exploits Fisher's criterion to PNMF for utilizing the label information. Similar to PNMF, DPNMF learns a single non-negative basis matrix and needs less computational burden than NMF. In contrast to PNMF, DPNMF maximizes the distance between centers of any two classes of examples meanwhile minimizes the distance between any two examples of the same class in the lower-dimensional subspace and thus has more discriminant power. We develop a multiplicative update rule to solve DPNMF and prove its convergence. Experimental results on four popular face image datasets confirm its effectiveness comparing with the representative NMF and PNMF algorithms. PMID:24376680
Hierarchical trie packet classification algorithm based on expectation-maximization clustering

PubMed Central

Bi, Xia-an; Zhao, Junxia

2017-01-01

With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
A fast parallel clustering algorithm for molecular simulation trajectories.

PubMed

Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui

2013-01-15

We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
A Case-Based Reasoning Method with Rank Aggregation

NASA Astrophysics Data System (ADS)

Sun, Jinhua; Du, Jiao; Hu, Jian

2018-03-01

In order to improve the accuracy of case-based reasoning (CBR), this paper addresses a new CBR framework with the basic principle of rank aggregation. First, the ranking methods are put forward in each attribute subspace of case. The ordering relation between cases on each attribute is got between cases. Then, a sorting matrix is got. Second, the similar case retrieval process from ranking matrix is transformed into a rank aggregation optimal problem, which uses the Kemeny optimal. On the basis, a rank aggregation case-based reasoning algorithm, named RA-CBR, is designed. The experiment result on UCI data sets shows that case retrieval accuracy of RA-CBR algorithm is higher than euclidean distance CBR and mahalanobis distance CBR testing.So we can get the conclusion that RA-CBR method can increase the performance and efficiency of CBR.
Geometric mean for subspace selection.

PubMed

Tao, Dacheng; Li, Xuelong; Wu, Xindong; Maybank, Stephen J

2009-02-01

Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Long-term surface EMG monitoring using K-means clustering and compressive sensing

NASA Astrophysics Data System (ADS)

Balouchestani, Mohammadreza; Krishnan, Sridhar

2015-05-01

In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
A multi-populations multi-strategies differential evolution algorithm for structural optimization of metal nanoclusters

NASA Astrophysics Data System (ADS)

Fan, Tian-E.; Shao, Gui-Fang; Ji, Qing-Shuang; Zheng, Ji-Wen; Liu, Tun-dong; Wen, Yu-Hua

2016-11-01

Theoretically, the determination of the structure of a cluster is to search the global minimum on its potential energy surface. The global minimization problem is often nondeterministic-polynomial-time (NP) hard and the number of local minima grows exponentially with the cluster size. In this article, a multi-populations multi-strategies differential evolution algorithm has been proposed to search the globally stable structure of Fe and Cr nanoclusters. The algorithm combines a multi-populations differential evolution with an elite pool scheme to keep the diversity of the solutions and avoid prematurely trapping into local optima. Moreover, multi-strategies such as growing method in initialization and three differential strategies in mutation are introduced to improve the convergence speed and lower the computational cost. The accuracy and effectiveness of our algorithm have been verified by comparing the results of Fe clusters with Cambridge Cluster Database. Meanwhile, the performance of our algorithm has been analyzed by comparing the convergence rate and energy evaluations with the classical DE algorithm. The multi-populations, multi-strategies mutation and growing method in initialization in our algorithm have been considered respectively. Furthermore, the structural growth pattern of Cr clusters has been predicted by this algorithm. The results show that the lowest-energy structure of Cr clusters contains many icosahedra, and the number of the icosahedral rings rises with increasing size.
Subspace algorithms for identifying separable-in-denominator 2D systems with deterministic-stochastic inputs

NASA Astrophysics Data System (ADS)

Ramos, José A.; Mercère, Guillaume

2016-12-01

In this paper, we present an algorithm for identifying two-dimensional (2D) causal, recursive and separable-in-denominator (CRSD) state-space models in the Roesser form with deterministic-stochastic inputs. The algorithm implements the N4SID, PO-MOESP and CCA methods, which are well known in the literature on 1D system identification, but here we do so for the 2D CRSD Roesser model. The algorithm solves the 2D system identification problem by maintaining the constraint structure imposed by the problem (i.e. Toeplitz and Hankel) and computes the horizontal and vertical system orders, system parameter matrices and covariance matrices of a 2D CRSD Roesser model. From a computational point of view, the algorithm has been presented in a unified framework, where the user can select which of the three methods to use. Furthermore, the identification task is divided into three main parts: (1) computing the deterministic horizontal model parameters, (2) computing the deterministic vertical model parameters and (3) computing the stochastic components. Specific attention has been paid to the computation of a stabilised Kalman gain matrix and a positive real solution when required. The efficiency and robustness of the unified algorithm have been demonstrated via a thorough simulation example.
A Type-2 Block-Component-Decomposition Based 2D AOA Estimation Algorithm for an Electromagnetic Vector Sensor Array

PubMed Central

Gao, Yu-Fei; Gui, Guan; Xie, Wei; Zou, Yan-Bin; Yang, Yue; Wan, Qun

2017-01-01

This paper investigates a two-dimensional angle of arrival (2D AOA) estimation algorithm for the electromagnetic vector sensor (EMVS) array based on Type-2 block component decomposition (BCD) tensor modeling. Such a tensor decomposition method can take full advantage of the multidimensional structural information of electromagnetic signals to accomplish blind estimation for array parameters with higher resolution. However, existing tensor decomposition methods encounter many restrictions in applications of the EMVS array, such as the strict requirement for uniqueness conditions of decomposition, the inability to handle partially-polarized signals, etc. To solve these problems, this paper investigates tensor modeling for partially-polarized signals of an L-shaped EMVS array. The 2D AOA estimation algorithm based on rank-(L1,L2,·) BCD is developed, and the uniqueness condition of decomposition is analyzed. By means of the estimated steering matrix, the proposed algorithm can automatically achieve angle pair-matching. Numerical experiments demonstrate that the present algorithm has the advantages of both accuracy and robustness of parameter estimation. Even under the conditions of lower SNR, small angular separation and limited snapshots, the proposed algorithm still possesses better performance than subspace methods and the canonical polyadic decomposition (CPD) method. PMID:28448431

A Type-2 Block-Component-Decomposition Based 2D AOA Estimation Algorithm for an Electromagnetic Vector Sensor Array.

PubMed

Gao, Yu-Fei; Gui, Guan; Xie, Wei; Zou, Yan-Bin; Yang, Yue; Wan, Qun

2017-04-27

This paper investigates a two-dimensional angle of arrival (2D AOA) estimation algorithm for the electromagnetic vector sensor (EMVS) array based on Type-2 block component decomposition (BCD) tensor modeling. Such a tensor decomposition method can take full advantage of the multidimensional structural information of electromagnetic signals to accomplish blind estimation for array parameters with higher resolution. However, existing tensor decomposition methods encounter many restrictions in applications of the EMVS array, such as the strict requirement for uniqueness conditions of decomposition, the inability to handle partially-polarized signals, etc. To solve these problems, this paper investigates tensor modeling for partially-polarized signals of an L-shaped EMVS array. The 2D AOA estimation algorithm based on rank- ( L 1 , L 2 , · ) BCD is developed, and the uniqueness condition of decomposition is analyzed. By means of the estimated steering matrix, the proposed algorithm can automatically achieve angle pair-matching. Numerical experiments demonstrate that the present algorithm has the advantages of both accuracy and robustness of parameter estimation. Even under the conditions of lower SNR, small angular separation and limited snapshots, the proposed algorithm still possesses better performance than subspace methods and the canonical polyadic decomposition (CPD) method.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

PubMed

Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

2018-05-04

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data

PubMed Central

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A dynamic scheduling algorithm for singe-arm two-cluster tools with flexible processing times

NASA Astrophysics Data System (ADS)

Li, Xin; Fung, Richard Y. K.

2018-02-01

This article presents a dynamic algorithm for job scheduling in two-cluster tools producing multi-type wafers with flexible processing times. Flexible processing times mean that the actual times for processing wafers should be within given time intervals. The objective of the work is to minimize the completion time of the newly inserted wafer. To deal with this issue, a two-cluster tool is decomposed into three reduced single-cluster tools (RCTs) in a series based on a decomposition approach proposed in this article. For each single-cluster tool, a dynamic scheduling algorithm based on temporal constraints is developed to schedule the newly inserted wafer. Three experiments have been carried out to test the dynamic scheduling algorithm proposed, comparing with the results the 'earliest starting time' heuristic (EST) adopted in previous literature. The results show that the dynamic algorithm proposed in this article is effective and practical.
A novel artificial bee colony based clustering algorithm for categorical data.

PubMed

Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.
A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm

PubMed Central

Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

2013-01-01

This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm

PubMed Central

Ahmed, Zakir Hussain

2014-01-01

The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances. PMID:24701148
A genetic graph-based approach for partitional clustering.

PubMed

Menéndez, Héctor D; Barrero, David F; Camacho, David

2014-05-01

Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

PubMed

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Machine-learned cluster identification in high-dimensional data.

PubMed

Ultsch, Alfred; Lötsch, Jörn

2017-02-01

High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright Â© 2017 The Authors. Published by Elsevier Inc. All rights reserved.
LLNL Location and Detection Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myers, S C; Harris, D B; Anderson, M L

2003-07-16

We present two LLNL research projects in the topical areas of location and detection. The first project assesses epicenter accuracy using a multiple-event location algorithm, and the second project employs waveform subspace Correlation to detect and identify events at Fennoscandian mines. Accurately located seismic events are the bases of location calibration. A well-characterized set of calibration events enables new Earth model development, empirical calibration, and validation of models. In a recent study, Bondar et al. (2003) develop network coverage criteria for assessing the accuracy of event locations that are determined using single-event, linearized inversion methods. These criteria are conservative andmore » are meant for application to large bulletins where emphasis is on catalog completeness and any given event location may be improved through detailed analysis or application of advanced algorithms. Relative event location techniques are touted as advancements that may improve absolute location accuracy by (1) ensuring an internally consistent dataset, (2) constraining a subset of events to known locations, and (3) taking advantage of station and event correlation structure. Here we present the preliminary phase of this work in which we use Nevada Test Site (NTS) nuclear explosions, with known locations, to test the effect of travel-time model accuracy on relative location accuracy. Like previous studies, we find that the reference velocity-model and relative-location accuracy are highly correlated. We also find that metrics based on travel-time residual of relocated events are not a reliable for assessing either velocity-model or relative-location accuracy. In the topical area of detection, we develop specialized correlation (subspace) detectors for the principal mines surrounding the ARCES station located in the European Arctic. Our objective is to provide efficient screens for explosions occurring in the mines of the Kola Peninsula (Kovdor, Zapolyarny, Olenogorsk, Khibiny) and the major iron mines of northern Sweden (Malmberget, Kiruna). In excess of 90% of the events detected by the ARCES station are mining explosions, and a significant fraction are from these northern mining groups. The primary challenge in developing waveform correlation detectors is the degree of variation in the source time histories of the shots, which can result in poor correlation among events even in close proximity. Our approach to solving this problem is to use lagged subspace correlation detectors, which offer some prospect of compensating for variation and uncertainty in source time functions.« less
Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

PubMed

Kamali, Tahereh; Stashuk, Daniel

2016-10-01

Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright © 2016 Elsevier B.V. All rights reserved.
An Adaptive Clustering Approach Based on Minimum Travel Route Planning for Wireless Sensor Networks with a Mobile Sink

PubMed Central

Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin

2017-01-01

In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate. PMID:28445434
An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

PubMed

Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

2017-12-01

Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Regularization by Functions of Bounded Variation and Applications to Image Enhancement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Casas, E.; Kunisch, K.; Pola, C.

1999-09-15

Optimization problems regularized by bounded variation seminorms are analyzed. The optimality system is obtained and finite-dimensional approximations of bounded variation function spaces as well as of the optimization problems are studied. It is demonstrated that the choice of the vector norm in the definition of the bounded variation seminorm is of special importance for approximating subspaces consisting of piecewise constant functions. Algorithms based on a primal-dual framework that exploit the structure of these nondifferentiable optimization problems are proposed. Numerical examples are given for denoising of blocky images with very high noise.
Determining the Number of Clusters in a Data Set Without Graphical Interpretation

NASA Technical Reports Server (NTRS)

Aguirre, Nathan S.; Davies, Misty D.

2011-01-01

Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.

PubMed

Bhattacharya, Anindya; De, Rajat K

2010-08-01

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software. Copyright 2010 Elsevier Inc. All rights reserved.
A Model Comparison for Characterizing Protein Motions from Structure

NASA Astrophysics Data System (ADS)

David, Charles; Jacobs, Donald

2011-10-01

A comparative study is made using three computational models that characterize native state dynamics starting from known protein structures taken from four distinct SCOP classifications. A geometrical simulation is performed, and the results are compared to the elastic network model and molecular dynamics. The essential dynamics is quantified by a direct analysis of a mode subspace constructed from ANM and a principal component analysis on both the FRODA and MD trajectories using root mean square inner product and principal angles. Relative subspace sizes and overlaps are visualized using the projection of displacement vectors on the model modes. Additionally, a mode subspace is constructed from PCA on an exemplar set of X-ray crystal structures in order to determine similarly with respect to the generated ensembles. Quantitative analysis reveals there is significant overlap across the three model subspaces and the model independent subspace. These results indicate that structure is the key determinant for native state dynamics.
Zeno subspace in quantum-walk dynamics

NASA Astrophysics Data System (ADS)

Chandrashekar, C. M.

2010-11-01

We investigate discrete-time quantum-walk evolution under the influence of periodic measurements in position subspace. The undisturbed survival probability of the particle at the position subspace P(0,t) is compared with the survival probability after frequent (n) measurements at interval τ=t/n, P(0,τ)n. We show that P(0,τ)n>P(0,t) leads to the quantum Zeno effect in position subspace when a parameter θ in the quantum coin operations and frequency of measurements is greater than the critical value, θ>θc and n>nc. This Zeno effect in the subspace preserves the dynamics in coin Hilbert space of the walk dynamics and has the potential to play a significant role in quantum tasks such as preserving the quantum state of the particle at any particular position, and to understand the Zeno dynamics in a multidimensional system that is highly transient in nature.

Reducing the time requirement of k-means algorithm.

PubMed

Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou

2012-01-01

Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space R(d) and an integer k. The problem is to determine a set of k points in R(d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARI(HA)). We found that when k is close to d, the quality is good (ARI(HA)>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARI(HA)>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data.
Reducing the Time Requirement of k-Means Algorithm

PubMed Central

Osamor, Victor Chukwudi; Adebiyi, Ezekiel Femi; Oyelade, Jelilli Olarenwaju; Doumbia, Seydou

2012-01-01

Traditional k-means and most k-means variants are still computationally expensive for large datasets, such as microarray data, which have large datasets with large dimension size d. In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k. The problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We provided the correctness proof for this algorithm. Results obtained from testing the algorithm on three biological data and six non-biological data (three of these data are real, while the other three are simulated) also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of a known structure using the Hubert-Arabie Adjusted Rand index (ARIHA). We found that when k is close to d, the quality is good (ARIHA>0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA>0.9). In this paper, emphases are on the reduction of the time requirement of the k-means algorithm and its application to microarray data due to the desire to create a tool for clustering and malaria research. However, the new clustering algorithm can be used for other clustering needs as long as an appropriate measure of distance between the centroids and the members is used. This has been demonstrated in this work on six non-biological data. PMID:23239974
Cluster analysis based on dimensional information with applications to feature selection and classification

NASA Technical Reports Server (NTRS)

Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

1974-01-01

A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Block clustering based on difference of convex functions (DC) programming and DC algorithms.

PubMed

Le, Hoai Minh; Le Thi, Hoai An; Dinh, Tao Pham; Huynh, Van Ngai

2013-10-01

We investigate difference of convex functions (DC) programming and the DC algorithm (DCA) to solve the block clustering problem in the continuous framework, which traditionally requires solving a hard combinatorial optimization problem. DC reformulation techniques and exact penalty in DC programming are developed to build an appropriate equivalent DC program of the block clustering problem. They lead to an elegant and explicit DCA scheme for the resulting DC program. Computational experiments show the robustness and efficiency of the proposed algorithm and its superiority over standard algorithms such as two-mode K-means, two-mode fuzzy clustering, and block classification EM.
Online clustering algorithms for radar emitter classification.

PubMed

Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

2005-08-01

Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.

PubMed

Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B

2011-08-15

Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Research on the precise positioning of customers in large data environment

NASA Astrophysics Data System (ADS)

Zhou, Xu; He, Lili

2018-04-01

Customer positioning has always been a problem that enterprises focus on. In this paper, FCM clustering algorithm is used to cluster customer groups. However, due to the traditional FCM clustering algorithm, which is susceptible to the influence of the initial clustering center and easy to fall into the local optimal problem, the short board of FCM is solved by the gray optimization algorithm (GWO) to achieve efficient and accurate handling of a large number of retailer data.
An Enhanced K-Means Algorithm for Water Quality Analysis of The Haihe River in China.

PubMed

Zou, Hui; Zou, Zhihong; Wang, Xiaojing

2015-11-12

The increase and the complexity of data caused by the uncertain environment is today's reality. In order to identify water quality effectively and reliably, this paper presents a modified fast clustering algorithm for water quality analysis. The algorithm has adopted a varying weights K-means cluster algorithm to analyze water monitoring data. The varying weights scheme was the best weighting indicator selected by a modified indicator weight self-adjustment algorithm based on K-means, which is named MIWAS-K-means. The new clustering algorithm avoids the margin of the iteration not being calculated in some cases. With the fast clustering analysis, we can identify the quality of water samples. The algorithm is applied in water quality analysis of the Haihe River (China) data obtained by the monitoring network over a period of eight years (2006-2013) with four indicators at seven different sites (2078 samples). Both the theoretical and simulated results demonstrate that the algorithm is efficient and reliable for water quality analysis of the Haihe River. In addition, the algorithm can be applied to more complex data matrices with high dimensionality.
Cooperative network clustering and task allocation for heterogeneous small satellite network

NASA Astrophysics Data System (ADS)

Qin, Jing

The research of small satellite has emerged as a hot topic in recent years because of its economical prospects and convenience in launching and design. Due to the size and energy constraints of small satellites, forming a small satellite network(SSN) in which all the satellites cooperate with each other to finish tasks is an efficient and effective way to utilize them. In this dissertation, I designed and evaluated a weight based dominating set clustering algorithm, which efficiently organizes the satellites into stable clusters. The traditional clustering algorithms of large monolithic satellite networks, such as formation flying and satellite swarm, are often limited on automatic formation of clusters. Therefore, a novel Distributed Weight based Dominating Set(DWDS) clustering algorithm is designed to address the clustering problems in the stochastically deployed SSNs. Considering the unique features of small satellites, this algorithm is able to form the clusters efficiently and stably. In this algorithm, satellites are separated into different groups according to their spatial characteristics. A minimum dominating set is chosen as the candidate cluster head set based on their weights, which is a weighted combination of residual energy and connection degree. Then the cluster heads admit new neighbors that accept their invitations into the cluster, until the maximum cluster size is reached. Evaluated by the simulation results, in a SSN with 200 to 800 nodes, the algorithm is able to efficiently cluster more than 90% of nodes in 3 seconds. The Deadline Based Resource Balancing (DBRB) task allocation algorithm is designed for efficient task allocations in heterogeneous LEO small satellite networks. In the task allocation process, the dispatcher needs to consider the deadlines of the tasks as well as the residue energy of different resources for best energy utilization. We assume the tasks adopt a Map-Reduce framework, in which a task can consist of multiple subtasks. The DBRB algorithm is deployed on the head node of a cluster. It gathers the status from each cluster member and calculates their Node Importance Factors (NIFs) from the carried resources, residue power and compute capacity. The algorithm calculates the number of concurrent subtasks based on the deadlines, and allocates the subtasks to the nodes according to their NIF values. The simulation results show that when cluster members carry multiple resources, resource are more balanced and rare resources serve longer in DBRB than in the Earliest Deadline First algorithm. We also show that the algorithm performs well in service isolation by serving multiple tasks with different deadlines. Moreover, the average task response time with various cluster size settings is well controlled within deadlines as well. Except non-realtime tasks, small satellites may execute realtime tasks as well. The location-dependent tasks, such as image capturing, data transmission and remote sensing tasks are realtime tasks that are required to be started / finished on specific time. The resource energy balancing algorithm for realtime and non-realtime mixed workload is developed to efficiently schedule the tasks for best system performance. It calculates the residue energy for each resource type and tries to preserve resources and node availability when distributing tasks. Non-realtime tasks can be preempted by realtime tasks to provide better QoS to realtime tasks. I compared the performance of proposed algorithm with a random-priority scheduling algorithm, with only realtime tasks, non-realtime tasks and mixed tasks. It shows the resource energy reservation algorithm outperforms the latter one with both balanced and imbalanced workloads. Although the resource energy balancing task allocation algorithm for mixed workload provides preemption mechanism for realtime tasks, realtime tasks can still fail due to resource exhaustion. For LEO small satellite flies around the earth on stable orbits, the location-dependent realtime tasks can be considered as periodical tasks. Therefore, it is possible to reserve energy for these realtime tasks. The resource energy reservation algorithm preserves energy for the realtime tasks when the execution routine of periodical realtime tasks is known. In order to reserve energy for tasks starting very early in each period that the node does not have enough energy charged, an energy wrapping mechanism is also designed to calculate the residue energy from the previous period. The simulation results show that without energy reservation, realtime task failure rate can reach more than 60% when the workload is highly imbalanced. In contrast, the resource energy reservation produces zero RT task failures and leads to equal or better aggregate system throughput than the non-reservation algorithm. The proposed algorithm also preserves more energy because it avoids task preemption. (Abstract shortened by ProQuest.).
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

PubMed Central

Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
Discovering shared segments on the migration route of the bar-headed goose by time-based plane-sweeping trajectory clustering

USGS Publications Warehouse

Luo, Ze; Baoping, Yan; Takekawa, John Y.; Prosser, Diann J.

2012-01-01

We propose a new method to help ornithologists and ecologists discover shared segments on the migratory pathway of the bar-headed geese by time-based plane-sweeping trajectory clustering. We present a density-based time parameterized line segment clustering algorithm, which extends traditional comparable clustering algorithms from temporal and spatial dimensions. We present a time-based plane-sweeping trajectory clustering algorithm to reveal the dynamic evolution of spatial-temporal object clusters and discover common motion patterns of bar-headed geese in the process of migration. Experiments are performed on GPS-based satellite telemetry data from bar-headed geese and results demonstrate our algorithms can correctly discover shared segments of the bar-headed geese migratory pathway. We also present findings on the migratory behavior of bar-headed geese determined from this new analytical approach.
Computational gene expression profiling under salt stress reveals patterns of co-expression

PubMed Central

Sanchita; Sharma, Ashok

2016-01-01

Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
Active Subspaces of Airfoil Shape Parameterizations

NASA Astrophysics Data System (ADS)

Grey, Zachary J.; Constantine, Paul G.

2018-05-01

Design and optimization benefit from understanding the dependence of a quantity of interest (e.g., a design objective or constraint function) on the design variables. A low-dimensional active subspace, when present, identifies important directions in the space of design variables; perturbing a design along the active subspace associated with a particular quantity of interest changes that quantity more, on average, than perturbing the design orthogonally to the active subspace. This low-dimensional structure provides insights that characterize the dependence of quantities of interest on design variables. Airfoil design in a transonic flow field with a parameterized geometry is a popular test problem for design methodologies. We examine two particular airfoil shape parameterizations, PARSEC and CST, and study the active subspaces present in two common design quantities of interest, transonic lift and drag coefficients, under each shape parameterization. We mathematically relate the two parameterizations with a common polynomial series. The active subspaces enable low-dimensional approximations of lift and drag that relate to physical airfoil properties. In particular, we obtain and interpret a two-dimensional approximation of both transonic lift and drag, and we show how these approximation inform a multi-objective design problem.
Adiabatic evolution of decoherence-free subspaces and its shortcuts

NASA Astrophysics Data System (ADS)

Wu, S. L.; Huang, X. L.; Li, H.; Yi, X. X.

2017-10-01

The adiabatic theorem and shortcuts to adiabaticity for time-dependent open quantum systems are explored in this paper. Starting from the definition of dynamical stable decoherence-free subspace, we show that, under a compact adiabatic condition, the quantum state remains in the time-dependent decoherence-free subspace with an extremely high purity, even though the dynamics of the open quantum system may not be adiabatic. The adiabatic condition mentioned here in the adiabatic theorem for open systems is very similar to that for closed quantum systems, except that the operators required to change slowly are the Lindblad operators. We also show that the adiabatic evolution of decoherence-free subspaces depends on the existence of instantaneous decoherence-free subspaces, which requires that the Hamiltonian of open quantum systems be engineered according to the incoherent control protocol. In addition, shortcuts to adiabaticity for adiabatic decoherence-free subspaces are also presented based on the transitionless quantum driving method. Finally, we provide an example that consists of a two-level system coupled to a broadband squeezed vacuum field to show our theory. Our approach employs Markovian master equations and the theory can apply to finite-dimensional quantum open systems.
A scalable and practical one-pass clustering algorithm for recommender system

NASA Astrophysics Data System (ADS)

Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali

2015-12-01

KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
A Differential Evolution-Based Routing Algorithm for Environmental Monitoring Wireless Sensor Networks

PubMed Central

Li, Xiaofang; Xu, Lizhong; Wang, Huibin; Song, Jie; Yang, Simon X.

2010-01-01

The traditional Low Energy Adaptive Cluster Hierarchy (LEACH) routing protocol is a clustering-based protocol. The uneven selection of cluster heads results in premature death of cluster heads and premature blind nodes inside the clusters, thus reducing the overall lifetime of the network. With a full consideration of information on energy and distance distribution of neighboring nodes inside the clusters, this paper proposes a new routing algorithm based on differential evolution (DE) to improve the LEACH routing protocol. To meet the requirements of monitoring applications in outdoor environments such as the meteorological, hydrological and wetland ecological environments, the proposed algorithm uses the simple and fast search features of DE to optimize the multi-objective selection of cluster heads and prevent blind nodes for improved energy efficiency and system stability. Simulation results show that the proposed new LEACH routing algorithm has better performance, effectively extends the working lifetime of the system, and improves the quality of the wireless sensor networks. PMID:22219670
A Novel Energy-Aware Distributed Clustering Algorithm for Heterogeneous Wireless Sensor Networks in the Mobile Environment

PubMed Central

Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong

2015-01-01

In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm.

PubMed

Ju, Chunhua; Xu, Chonghuan

2013-01-01

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.
A New Collaborative Recommendation Approach Based on Users Clustering Using Artificial Bee Colony Algorithm

PubMed Central

Ju, Chunhua

2013-01-01

Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods. PMID:24381525
Convalescing Cluster Configuration Using a Superlative Framework

PubMed Central

Sabitha, R.; Karthik, S.

2015-01-01

Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895

Some links on this page may take you to non-federal websites. Their policies may differ from this site.