Solution of matrix equations using sparse techniques
NASA Technical Reports Server (NTRS)
Baddourah, Majdi
1994-01-01
The solution of large systems of matrix equations is key to the solution of a large number of scientific and engineering problems. This talk describes the sparse matrix solver developed at Langley which can routinely solve in excess of 263,000 equations in 40 seconds on one Cray C-90 processor. It appears that for large scale structural analysis applications, sparse matrix methods have a significant performance advantage over other methods.
Yang, C L; Wei, H Y; Adler, A; Soleimani, M
2013-06-01
Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Massively parallel sparse matrix function calculations with NTPoly
NASA Astrophysics Data System (ADS)
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Solving large sparse eigenvalue problems on supercomputers
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef
1988-01-01
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Sparse matrix methods based on orthogonality and conjugacy
NASA Technical Reports Server (NTRS)
Lawson, C. L.
1973-01-01
A matrix having a high percentage of zero elements is called spares. In the solution of systems of linear equations or linear least squares problems involving large sparse matrices, significant saving of computer cost can be achieved by taking advantage of the sparsity. The conjugate gradient algorithm and a set of related algorithms are described.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, Edmond
Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C
2010-09-21
We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Large-region acoustic source mapping using a movable array and sparse covariance fitting.
Zhao, Shengkui; Tuna, Cagdas; Nguyen, Thi Ngoc Tho; Jones, Douglas L
2017-01-01
Large-region acoustic source mapping is important for city-scale noise monitoring. Approaches using a single-position measurement scheme to scan large regions using small arrays cannot provide clean acoustic source maps, while deploying large arrays spanning the entire region of interest is prohibitively expensive. A multiple-position measurement scheme is applied to scan large regions at multiple spatial positions using a movable array of small size. Based on the multiple-position measurement scheme, a sparse-constrained multiple-position vectorized covariance matrix fitting approach is presented. In the proposed approach, the overall sample covariance matrix of the incoherent virtual array is first estimated using the multiple-position array data and then vectorized using the Khatri-Rao (KR) product. A linear model is then constructed for fitting the vectorized covariance matrix and a sparse-constrained reconstruction algorithm is proposed for recovering source powers from the model. The user parameter settings are discussed. The proposed approach is tested on a 30 m × 40 m region and a 60 m × 40 m region using simulated and measured data. Much cleaner acoustic source maps and lower sound pressure level errors are obtained compared to the beamforming approaches and the previous sparse approach [Zhao, Tuna, Nguyen, and Jones, Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP) (2016)].
Multi scales based sparse matrix spectral clustering image segmentation
NASA Astrophysics Data System (ADS)
Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin
2018-04-01
In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
Sparse subspace clustering for data with missing entries and high-rank matrix completion.
Fan, Jicong; Chow, Tommy W S
2017-09-01
Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Exploring Deep Learning and Sparse Matrix Format Selection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Y.; Liao, C.; Shen, X.
We proposed to explore the use of Deep Neural Networks (DNN) for addressing the longstanding barriers. The recent rapid progress of DNN technology has created a large impact in many fields, which has significantly improved the prediction accuracy over traditional machine learning techniques in image classifications, speech recognitions, machine translations, and so on. To some degree, these tasks resemble the decision makings in many HPC tasks, including the aforementioned format selection for SpMV and linear solver selection. For instance, sparse matrix format selection is akin to image classification—such as, to tell whether an image contains a dog or a cat;more » in both problems, the right decisions are primarily determined by the spatial patterns of the elements in an input. For image classification, the patterns are of pixels, and for sparse matrix format selection, they are of non-zero elements. DNN could be naturally applied if we regard a sparse matrix as an image and the format selection or solver selection as classification problems.« less
Matched field localization based on CS-MUSIC algorithm
NASA Astrophysics Data System (ADS)
Guo, Shuangle; Tang, Ruichun; Peng, Linhui; Ji, Xiaopeng
2016-04-01
The problem caused by shortness or excessiveness of snapshots and by coherent sources in underwater acoustic positioning is considered. A matched field localization algorithm based on CS-MUSIC (Compressive Sensing Multiple Signal Classification) is proposed based on the sparse mathematical model of the underwater positioning. The signal matrix is calculated through the SVD (Singular Value Decomposition) of the observation matrix. The observation matrix in the sparse mathematical model is replaced by the signal matrix, and a new concise sparse mathematical model is obtained, which means not only the scale of the localization problem but also the noise level is reduced; then the new sparse mathematical model is solved by the CS-MUSIC algorithm which is a combination of CS (Compressive Sensing) method and MUSIC (Multiple Signal Classification) method. The algorithm proposed in this paper can overcome effectively the difficulties caused by correlated sources and shortness of snapshots, and it can also reduce the time complexity and noise level of the localization problem by using the SVD of the observation matrix when the number of snapshots is large, which will be proved in this paper.
Solving large tomographic linear systems: size reduction and error estimation
NASA Astrophysics Data System (ADS)
Voronin, Sergey; Mikesell, Dylan; Slezak, Inna; Nolet, Guust
2014-10-01
We present a new approach to reduce a sparse, linear system of equations associated with tomographic inverse problems. We begin by making a modification to the commonly used compressed sparse-row format, whereby our format is tailored to the sparse structure of finite-frequency (volume) sensitivity kernels in seismic tomography. Next, we cluster the sparse matrix rows to divide a large matrix into smaller subsets representing ray paths that are geographically close. Singular value decomposition of each subset allows us to project the data onto a subspace associated with the largest eigenvalues of the subset. After projection we reject those data that have a signal-to-noise ratio (SNR) below a chosen threshold. Clustering in this way assures that the sparse nature of the system is minimally affected by the projection. Moreover, our approach allows for a precise estimation of the noise affecting the data while also giving us the ability to identify outliers. We illustrate the method by reducing large matrices computed for global tomographic systems with cross-correlation body wave delays, as well as with surface wave phase velocity anomalies. For a massive matrix computed for 3.7 million Rayleigh wave phase velocity measurements, imposing a threshold of 1 for the SNR, we condensed the matrix size from 1103 to 63 Gbyte. For a global data set of multiple-frequency P wave delays from 60 well-distributed deep earthquakes we obtain a reduction to 5.9 per cent. This type of reduction allows one to avoid loss of information due to underparametrizing models. Alternatively, if data have to be rejected to fit the system into computer memory, it assures that the most important data are preserved.
Uniform Recovery Bounds for Structured Random Matrices in Corrupted Compressed Sensing
NASA Astrophysics Data System (ADS)
Zhang, Peng; Gan, Lu; Ling, Cong; Sun, Sumei
2018-04-01
We study the problem of recovering an $s$-sparse signal $\\mathbf{x}^{\\star}\\in\\mathbb{C}^n$ from corrupted measurements $\\mathbf{y} = \\mathbf{A}\\mathbf{x}^{\\star}+\\mathbf{z}^{\\star}+\\mathbf{w}$, where $\\mathbf{z}^{\\star}\\in\\mathbb{C}^m$ is a $k$-sparse corruption vector whose nonzero entries may be arbitrarily large and $\\mathbf{w}\\in\\mathbb{C}^m$ is a dense noise with bounded energy. The aim is to exactly and stably recover the sparse signal with tractable optimization programs. In this paper, we prove the uniform recovery guarantee of this problem for two classes of structured sensing matrices. The first class can be expressed as the product of a unit-norm tight frame (UTF), a random diagonal matrix and a bounded columnwise orthonormal matrix (e.g., partial random circulant matrix). When the UTF is bounded (i.e. $\\mu(\\mathbf{U})\\sim1/\\sqrt{m}$), we prove that with high probability, one can recover an $s$-sparse signal exactly and stably by $l_1$ minimization programs even if the measurements are corrupted by a sparse vector, provided $m = \\mathcal{O}(s \\log^2 s \\log^2 n)$ and the sparsity level $k$ of the corruption is a constant fraction of the total number of measurements. The second class considers randomly sub-sampled orthogonal matrix (e.g., random Fourier matrix). We prove the uniform recovery guarantee provided that the corruption is sparse on certain sparsifying domain. Numerous simulation results are also presented to verify and complement the theoretical results.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors
NASA Technical Reports Server (NTRS)
David, R. E.
1984-01-01
Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
NASA Astrophysics Data System (ADS)
Stoykov, S.; Atanassov, E.; Margenov, S.
2016-10-01
Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
NASA Astrophysics Data System (ADS)
Ghale, Purnima; Johnson, Harley T.
2018-06-01
We present an efficient sparse matrix-vector (SpMV) based method to compute the density matrix P from a given Hamiltonian in electronic structure computations. Our method is a hybrid approach based on Chebyshev-Jackson approximation theory and matrix purification methods like the second order spectral projection purification (SP2). Recent methods to compute the density matrix scale as O(N) in the number of floating point operations but are accompanied by large memory and communication overhead, and they are based on iterative use of the sparse matrix-matrix multiplication kernel (SpGEMM), which is known to be computationally irregular. In addition to irregularity in the sparse Hamiltonian H, the nonzero structure of intermediate estimates of P depends on products of H and evolves over the course of computation. On the other hand, an expansion of the density matrix P in terms of Chebyshev polynomials is straightforward and SpMV based; however, the resulting density matrix may not satisfy the required constraints exactly. In this paper, we analyze the strengths and weaknesses of the Chebyshev-Jackson polynomials and the second order spectral projection purification (SP2) method, and propose to combine them so that the accurate density matrix can be computed using the SpMV computational kernel only, and without having to store the density matrix P. Our method accomplishes these objectives by using the Chebyshev polynomial estimate as the initial guess for SP2, which is followed by using sparse matrix-vector multiplications (SpMVs) to replicate the behavior of the SP2 algorithm for purification. We demonstrate the method on a tight-binding model system of an oxide material containing more than 3 million atoms. In addition, we also present the predicted behavior of our method when applied to near-metallic Hamiltonians with a wide energy spectrum.
Strategies for vectorizing the sparse matrix vector product on the CRAY XMP, CRAY 2, and CYBER 205
NASA Technical Reports Server (NTRS)
Bauschlicher, Charles W., Jr.; Partridge, Harry
1987-01-01
Large, randomly sparse matrix vector products are important in a number of applications in computational chemistry, such as matrix diagonalization and the solution of simultaneous equations. Vectorization of this process is considered for the CRAY XMP, CRAY 2, and CYBER 205, using a matrix of dimension of 20,000 with from 1 percent to 6 percent nonzeros. Efficient scatter/gather capabilities add coding flexibility and yield significant improvements in performance. For the CYBER 205, it is shown that minor changes in the IO can reduce the CPU time by a factor of 50. Similar changes in the CRAY codes make a far smaller improvement.
User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.
NASA Technical Reports Server (NTRS)
Reddy, C. J.
2000-01-01
PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
Noniterative MAP reconstruction using sparse matrix representations.
Cao, Guangzhi; Bouman, Charles A; Webb, Kevin J
2009-09-01
We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations. Our approach is to precompute and store the inverse matrix required for MAP reconstruction. This approach has generally not been used in the past because the inverse matrix is typically large and fully populated (i.e., not sparse). In order to overcome this problem, we introduce two new ideas. The first idea is a novel theory for the lossy source coding of matrix transformations which we refer to as matrix source coding. This theory is based on a distortion metric that reflects the distortions produced in the final matrix-vector product, rather than the distortions in the coded matrix itself. The resulting algorithms are shown to require orthonormal transformations of both the measurement data and the matrix rows and columns before quantization and coding. The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT). The SMT is a generalization of the classical FFT in that it uses butterflies to compute an orthonormal transform; but unlike an FFT, the SMT uses the butterflies in an irregular pattern, and is numerically designed to best approximate the desired transforms. We demonstrate the potential of the noniterative MAP reconstruction with examples from optical tomography. The method requires offline computation to encode the inverse transform. However, once these offline computations are completed, the noniterative MAP algorithm is shown to reduce both storage and computation by well over two orders of magnitude, as compared to a linear iterative reconstruction methods.
Approximate method of variational Bayesian matrix factorization/completion with sparse prior
NASA Astrophysics Data System (ADS)
Kawasumi, Ryota; Takeda, Koujin
2018-05-01
We derive the analytical expression of a matrix factorization/completion solution by the variational Bayes method, under the assumption that the observed matrix is originally the product of low-rank, dense and sparse matrices with additive noise. We assume the prior of a sparse matrix is a Laplace distribution by taking matrix sparsity into consideration. Then we use several approximations for the derivation of a matrix factorization/completion solution. By our solution, we also numerically evaluate the performance of a sparse matrix reconstruction in matrix factorization, and completion of a missing matrix element in matrix completion.
NASA Astrophysics Data System (ADS)
Kaporin, I. E.
2012-02-01
In order to precondition a sparse symmetric positive definite matrix, its approximate inverse is examined, which is represented as the product of two sparse mutually adjoint triangular matrices. In this way, the solution of the corresponding system of linear algebraic equations (SLAE) by applying the preconditioned conjugate gradient method (CGM) is reduced to performing only elementary vector operations and calculating sparse matrix-vector products. A method for constructing the above preconditioner is described and analyzed. The triangular factor has a fixed sparsity pattern and is optimal in the sense that the preconditioned matrix has a minimum K-condition number. The use of polynomial preconditioning based on Chebyshev polynomials makes it possible to considerably reduce the amount of scalar product operations (at the cost of an insignificant increase in the total number of arithmetic operations). The possibility of an efficient massively parallel implementation of the resulting method for solving SLAEs is discussed. For a sequential version of this method, the results obtained by solving 56 test problems from the Florida sparse matrix collection (which are large-scale and ill-conditioned) are presented. These results show that the method is highly reliable and has low computational costs.
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Eigensolver for a Sparse, Large Hermitian Matrix
NASA Technical Reports Server (NTRS)
Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris
2003-01-01
A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication
Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...
2015-01-01
The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi
2017-10-10
We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
A fast time-difference inverse solver for 3D EIT with application to lung imaging.
Javaherian, Ashkan; Soleimani, Manuchehr; Moeller, Knut
2016-08-01
A class of sparse optimization techniques that require solely matrix-vector products, rather than an explicit access to the forward matrix and its transpose, has been paid much attention in the recent decade for dealing with large-scale inverse problems. This study tailors application of the so-called Gradient Projection for Sparse Reconstruction (GPSR) to large-scale time-difference three-dimensional electrical impedance tomography (3D EIT). 3D EIT typically suffers from the need for a large number of voxels to cover the whole domain, so its application to real-time imaging, for example monitoring of lung function, remains scarce since the large number of degrees of freedom of the problem extremely increases storage space and reconstruction time. This study shows the great potential of the GPSR for large-size time-difference 3D EIT. Further studies are needed to improve its accuracy for imaging small-size anomalies.
GPU-accelerated element-free reverse-time migration with Gauss points partition
NASA Astrophysics Data System (ADS)
Zhou, Zhen; Jia, Xiaofeng; Qiang, Xiaodong
2018-06-01
An element-free method (EFM) has been demonstrated successfully in elasticity, heat conduction and fatigue crack growth problems. We present the theory of EFM and its numerical applications in seismic modelling and reverse time migration (RTM). Compared with the finite difference method and the finite element method, the EFM has unique advantages: (1) independence of grids in computation and (2) lower expense and more flexibility (because only the information of the nodes and the boundary of the concerned area is required). However, in EFM, due to improper computation and storage of some large sparse matrices, such as the mass matrix and the stiffness matrix, the method is difficult to apply to seismic modelling and RTM for a large velocity model. To solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition and utilise the graphics processing unit to improve the computational efficiency. We employ the compressed sparse row format to compress the intermediate large sparse matrices and attempt to simplify the operations by solving the linear equations with CULA solver. To improve the computation efficiency further, we introduce the concept of the lumped mass matrix. Numerical experiments indicate that the proposed method is accurate and more efficient than the regular EFM.
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions
Liu, Weidong; Luo, Xi
2014-01-01
This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset. PMID:25750463
NASA Astrophysics Data System (ADS)
Bouchet, L.; Amestoy, P.; Buttari, A.; Rouet, F.-H.; Chauvin, M.
2013-02-01
Nowadays, analyzing and reducing the ever larger astronomical datasets is becoming a crucial challenge, especially for long cumulated observation times. The INTEGRAL/SPI X/γ-ray spectrometer is an instrument for which it is essential to process many exposures at the same time in order to increase the low signal-to-noise ratio of the weakest sources. In this context, the conventional methods for data reduction are inefficient and sometimes not feasible at all. Processing several years of data simultaneously requires computing not only the solution of a large system of equations, but also the associated uncertainties. We aim at reducing the computation time and the memory usage. Since the SPI transfer function is sparse, we have used some popular methods for the solution of large sparse linear systems; we briefly review these methods. We use the Multifrontal Massively Parallel Solver (MUMPS) to compute the solution of the system of equations. We also need to compute the variance of the solution, which amounts to computing selected entries of the inverse of the sparse matrix corresponding to our linear system. This can be achieved through one of the latest features of the MUMPS software that has been partly motivated by this work. In this paper we provide a brief presentation of this feature and evaluate its effectiveness on astrophysical problems requiring the processing of large datasets simultaneously, such as the study of the entire emission of the Galaxy. We used these algorithms to solve the large sparse systems arising from SPI data processing and to obtain both their solutions and the associated variances. In conclusion, thanks to these newly developed tools, processing large datasets arising from SPI is now feasible with both a reasonable execution time and a low memory usage.
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Amesos2 and Belos: Direct and Iterative Solvers for Large Sparse Linear Systems
Bavier, Eric; Hoemmen, Mark; Rajamanickam, Sivasankaran; ...
2012-01-01
Solvers for large sparse linear systems come in two categories: direct and iterative. Amesos2, a package in the Trilinos software project, provides direct methods, and Belos, another Trilinos package, provides iterative methods. Amesos2 offers a common interface to many different sparse matrix factorization codes, and can handle any implementation of sparse matrices and vectors, via an easy-to-extend C++ traits interface. It can also factor matrices whose entries have arbitrary “Scalar” type, enabling extended-precision and mixed-precision algorithms. Belos includes many different iterative methods for solving large sparse linear systems and least-squares problems. Unlike competing iterative solver libraries, Belos completely decouples themore » algorithms from the implementations of the underlying linear algebra objects. This lets Belos exploit the latest hardware without changes to the code. Belos favors algorithms that solve higher-level problems, such as multiple simultaneous linear systems and sequences of related linear systems, faster than standard algorithms. The package also supports extended-precision and mixed-precision algorithms. Together, Amesos2 and Belos form a complete suite of sparse linear solvers.« less
Large Covariance Estimation by Thresholding Principal Orthogonal Complements
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2012-01-01
This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented. PMID:24348088
Large Covariance Estimation by Thresholding Principal Orthogonal Complements.
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2013-09-01
This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Sparse Matrices in MATLAB: Design and Implementation
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Moler, Cleve; Schreiber, Robert
1992-01-01
The matrix computation language and environment MATLAB is extended to include sparse matrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportional to the number of arithmetic operations on nonzeros.
Tensor-GMRES method for large sparse systems of nonlinear equations
NASA Technical Reports Server (NTRS)
Feng, Dan; Pulliam, Thomas H.
1994-01-01
This paper introduces a tensor-Krylov method, the tensor-GMRES method, for large sparse systems of nonlinear equations. This method is a coupling of tensor model formation and solution techniques for nonlinear equations with Krylov subspace projection techniques for unsymmetric systems of linear equations. Traditional tensor methods for nonlinear equations are based on a quadratic model of the nonlinear function, a standard linear model augmented by a simple second order term. These methods are shown to be significantly more efficient than standard methods both on nonsingular problems and on problems where the Jacobian matrix at the solution is singular. A major disadvantage of the traditional tensor methods is that the solution of the tensor model requires the factorization of the Jacobian matrix, which may not be suitable for problems where the Jacobian matrix is large and has a 'bad' sparsity structure for an efficient factorization. We overcome this difficulty by forming and solving the tensor model using an extension of a Newton-GMRES scheme. Like traditional tensor methods, we show that the new tensor method has significant computational advantages over the analogous Newton counterpart. Consistent with Krylov subspace based methods, the new tensor method does not depend on the factorization of the Jacobian matrix. As a matter of fact, the Jacobian matrix is never needed explicitly.
Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...
2015-07-14
In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*
Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.
2014-01-01
We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094
NASA Astrophysics Data System (ADS)
Zheng, Maoteng; Zhang, Yongjun; Zhou, Shunping; Zhu, Junfeng; Xiong, Xiaodong
2016-07-01
In recent years, new platforms and sensors in photogrammetry, remote sensing and computer vision areas have become available, such as Unmanned Aircraft Vehicles (UAV), oblique camera systems, common digital cameras and even mobile phone cameras. Images collected by all these kinds of sensors could be used as remote sensing data sources. These sensors can obtain large-scale remote sensing data which consist of a great number of images. Bundle block adjustment of large-scale data with conventional algorithm is very time and space (memory) consuming due to the super large normal matrix arising from large-scale data. In this paper, an efficient Block-based Sparse Matrix Compression (BSMC) method combined with the Preconditioned Conjugate Gradient (PCG) algorithm is chosen to develop a stable and efficient bundle block adjustment system in order to deal with the large-scale remote sensing data. The main contribution of this work is the BSMC-based PCG algorithm which is more efficient in time and memory than the traditional algorithm without compromising the accuracy. Totally 8 datasets of real data are used to test our proposed method. Preliminary results have shown that the BSMC method can efficiently decrease the time and memory requirement of large-scale data.
A performance study of sparse Cholesky factorization on INTEL iPSC/860
NASA Technical Reports Server (NTRS)
Zubair, M.; Ghose, M.
1992-01-01
The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
An Efficient Scheme for Updating Sparse Cholesky Factors
NASA Technical Reports Server (NTRS)
Raghavan, Padma
2002-01-01
Raghavan had earlier developed the software package DCSPACK which can be used for solving sparse linear systems where the coefficient matrix is symmetric and positive definite (this project was not funded by NASA but by agencies such as NSF). DSCPACK-S is the serial code and DSCPACK-P is a parallel implementation suitable for multiprocessors or networks-of-workstations with message passing using MCI. The main algorithm used is the Cholesky factorization of a sparse symmetric positive positive definite matrix A = LL(T). The code can also compute the factorization A = LDL(T). The complexity of the software arises from several factors relating to the sparsity of the matrix A. A sparse N x N matrix A has typically less that cN nonzeroes where c is a small constant. If the matrix were dense, it would have O(N2) nonzeroes. The most complicated part of such sparse Cholesky factorization relates to fill-in, i.e., zeroes in the original matrix that become nonzeroes in the factor L. An efficient implementation depends to a large extent on complex data structures and on techniques from graph theory to reduce, identify, and manage fill. DSCPACK is based on an efficient multifrontal implementation with fill-managing algorithms and implementation arising from earlier research by Raghavan and others. Sparse Cholesky factorization is typically a four step process: (1) ordering to compute a fill-reducing numbering, (2) symbolic factorization to determine the nonzero structure of L, (3) numeric factorization to compute L, and, (4) triangular solution to solve L(T)x = y and Ly = b. The first two steps are symbolic and are performed using the graph of the matrix. The numeric factorization step is of dominant cost and there are several schemes for improving performance by exploiting the nested and dense structure of groups of columns in the factor. The latter are aimed at better utilization of the cache-memory hierarchy on modem processors to prevent cache-misses and provide execution rates (operations/second) that are close to the peak rates for dense matrix computations. Currently, EPISCOPACY is being used in an application at NASA directed by J. Newman and M. James. We propose the implementation of efficient schemes for updating the LL(T) or LDL(T) factors computed in DSCPACK-S to meet the computational requirements of their project. A brief description is provided in the next section.
Kim, Hyunsoo; Park, Haesun
2007-06-15
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
SPARSKIT: A basic tool kit for sparse matrix computations
NASA Technical Reports Server (NTRS)
Saad, Youcef
1990-01-01
Presented here are the main features of a tool package for manipulating and working with sparse matrices. One of the goals of the package is to provide basic tools to facilitate the exchange of software and data between researchers in sparse matrix computations. The starting point is the Harwell/Boeing collection of matrices for which the authors provide a number of tools. Among other things, the package provides programs for converting data structures, printing simple statistics on a matrix, plotting a matrix profile, and performing linear algebra operations with sparse matrices.
Cao, Buwen; Deng, Shuguang; Qin, Hua; Ding, Pingjian; Chen, Shaopeng; Li, Guanghui
2018-06-15
High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein⁻protein interaction (PPI) networks. In this study, based on penalized matrix decomposition ( PMD ), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMD pc ) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMD pc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
1982-10-27
are buried within * a much larger, special purpose package. We regret such omissions, but to have reached the practi- tioners in each of the diverse...sparse matrix (form PAQ ) 4. Method of solution: Distribution count sort 5. Programming language: FORTRAN g Precision: Single and double precision 7
Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation.
Lam, Clifford; Fan, Jianqing
2009-01-01
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Li, Xiaoye; Heber, Gerd; Biswas, Rupak
2000-01-01
The ability of computers to solve hitherto intractable problems and simulate complex processes using mathematical models makes them an indispensable part of modern science and engineering. Computer simulations of large-scale realistic applications usually require solving a set of non-linear partial differential equations (PDES) over a finite region. For example, one thrust area in the DOE Grand Challenge projects is to design future accelerators such as the SpaHation Neutron Source (SNS). Our colleagues at SLAC need to model complex RFQ cavities with large aspect ratios. Unstructured grids are currently used to resolve the small features in a large computational domain; dynamic mesh adaptation will be added in the future for additional efficiency. The PDEs for electromagnetics are discretized by the FEM method, which leads to a generalized eigenvalue problem Kx = AMx, where K and M are the stiffness and mass matrices, and are very sparse. In a typical cavity model, the number of degrees of freedom is about one million. For such large eigenproblems, direct solution techniques quickly reach the memory limits. Instead, the most widely-used methods are Krylov subspace methods, such as Lanczos or Jacobi-Davidson. In all the Krylov-based algorithms, sparse matrix-vector multiplication (SPMV) must be performed repeatedly. Therefore, the efficiency of SPMV usually determines the eigensolver speed. SPMV is also one of the most heavily used kernels in large-scale numerical simulations.
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...
2017-03-05
Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Low-rank matrix decomposition and spatio-temporal sparse recovery for STAP radar
Sen, Satyabrata
2015-08-04
We develop space-time adaptive processing (STAP) methods by leveraging the advantages of sparse signal processing techniques in order to detect a slowly-moving target. We observe that the inherent sparse characteristics of a STAP problem can be formulated as the low-rankness of clutter covariance matrix when compared to the total adaptive degrees-of-freedom, and also as the sparse interference spectrum on the spatio-temporal domain. By exploiting these sparse properties, we propose two approaches for estimating the interference covariance matrix. In the first approach, we consider a constrained matrix rank minimization problem (RMP) to decompose the sample covariance matrix into a low-rank positivemore » semidefinite and a diagonal matrix. The solution of RMP is obtained by applying the trace minimization technique and the singular value decomposition with matrix shrinkage operator. Our second approach deals with the atomic norm minimization problem to recover the clutter response-vector that has a sparse support on the spatio-temporal plane. We use convex relaxation based standard sparse-recovery techniques to find the solutions. With extensive numerical examples, we demonstrate the performances of proposed STAP approaches with respect to both the ideal and practical scenarios, involving Doppler-ambiguous clutter ridges, spatial and temporal decorrelation effects. As a result, the low-rank matrix decomposition based solution requires secondary measurements as many as twice the clutter rank to attain a near-ideal STAP performance; whereas the spatio-temporal sparsity based approach needs a considerably small number of secondary data.« less
The application of nonlinear programming and collocation to optimal aeroassisted orbital transfers
NASA Astrophysics Data System (ADS)
Shi, Y. Y.; Nelson, R. L.; Young, D. H.; Gill, P. E.; Murray, W.; Saunders, M. A.
1992-01-01
Sequential quadratic programming (SQP) and collocation of the differential equations of motion were applied to optimal aeroassisted orbital transfers. The Optimal Trajectory by Implicit Simulation (OTIS) computer program codes with updated nonlinear programming code (NZSOL) were used as a testbed for the SQP nonlinear programming (NLP) algorithms. The state-of-the-art sparse SQP method is considered to be effective for solving large problems with a sparse matrix. Sparse optimizers are characterized in terms of memory requirements and computational efficiency. For the OTIS problems, less than 10 percent of the Jacobian matrix elements are nonzero. The SQP method encompasses two phases: finding an initial feasible point by minimizing the sum of infeasibilities and minimizing the quadratic objective function within the feasible region. The orbital transfer problem under consideration involves the transfer from a high energy orbit to a low energy orbit.
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary
NASA Astrophysics Data System (ADS)
Gillis, Nicolas; Luce, Robert
2018-01-01
A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Alaghband, Gita; Jordan, Harry F.
1989-01-01
It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.
Structural performance analysis and redesign
NASA Technical Reports Server (NTRS)
Whetstone, W. D.
1978-01-01
Program performs stress buckling and vibrational analysis of large, linear, finite-element systems in excess of 50,000 degrees of freedom. Cost, execution time, and storage requirements are kept reasonable through use of sparse matrix solution techniques, and other computational and data management procedures designed for problems of very large size.
Sparse Matrix for ECG Identification with Two-Lead Features.
Tseng, Kuo-Kun; Luo, Jiao; Hegarty, Robert; Wang, Wenmin; Haiting, Dong
2015-01-01
Electrocardiograph (ECG) human identification has the potential to improve biometric security. However, improvements in ECG identification and feature extraction are required. Previous work has focused on single lead ECG signals. Our work proposes a new algorithm for human identification by mapping two-lead ECG signals onto a two-dimensional matrix then employing a sparse matrix method to process the matrix. And that is the first application of sparse matrix techniques for ECG identification. Moreover, the results of our experiments demonstrate the benefits of our approach over existing methods.
An overview of NSPCG: A nonsymmetric preconditioned conjugate gradient package
NASA Astrophysics Data System (ADS)
Oppe, Thomas C.; Joubert, Wayne D.; Kincaid, David R.
1989-05-01
The most recent research-oriented software package developed as part of the ITPACK Project is called "NSPCG" since it contains many nonsymmetric preconditioned conjugate gradient procedures. It is designed to solve large sparse systems of linear algebraic equations by a variety of different iterative methods. One of the main purposes for the development of the package is to provide a common modular structure for research on iterative methods for nonsymmetric matrices. Another purpose for the development of the package is to investigate the suitability of several iterative methods for vector computers. Since the vectorizability of an iterative method depends greatly on the matrix structure, NSPCG allows great flexibility in the operator representation. The coefficient matrix can be passed in one of several different matrix data storage schemes. These sparse data formats allow matrices with a wide range of structures from highly structured ones such as those with all nonzeros along a relatively small number of diagonals to completely unstructured sparse matrices. Alternatively, the package allows the user to call the accelerators directly with user-supplied routines for performing certain matrix operations. In this case, one can use the data format from an application program and not be required to copy the matrix into one of the package formats. This is particularly advantageous when memory space is limited. Some of the basic preconditioners that are available are point methods such as Jacobi, Incomplete LU Decomposition and Symmetric Successive Overrelaxation as well as block and multicolor preconditioners. The user can select from a large collection of accelerators such as Conjugate Gradient (CG), Chebyshev (SI, for semi-iterative), Generalized Minimal Residual (GMRES), Biconjugate Gradient Squared (BCGS) and many others. The package is modular so that almost any accelerator can be used with almost any preconditioner.
Exact recovery of sparse multiple measurement vectors by [Formula: see text]-minimization.
Wang, Changlong; Peng, Jigen
2018-01-01
The joint sparse recovery problem is a generalization of the single measurement vector problem widely studied in compressed sensing. It aims to recover a set of jointly sparse vectors, i.e., those that have nonzero entries concentrated at a common location. Meanwhile [Formula: see text]-minimization subject to matrixes is widely used in a large number of algorithms designed for this problem, i.e., [Formula: see text]-minimization [Formula: see text] Therefore the main contribution in this paper is two theoretical results about this technique. The first one is proving that in every multiple system of linear equations there exists a constant [Formula: see text] such that the original unique sparse solution also can be recovered from a minimization in [Formula: see text] quasi-norm subject to matrixes whenever [Formula: see text]. The other one is showing an analytic expression of such [Formula: see text]. Finally, we display the results of one example to confirm the validity of our conclusions, and we use some numerical experiments to show that we increase the efficiency of these algorithms designed for [Formula: see text]-minimization by using our results.
Saravanan, Chandra; Shao, Yihan; Baer, Roi; Ross, Philip N; Head-Gordon, Martin
2003-04-15
A sparse matrix multiplication scheme with multiatom blocks is reported, a tool that can be very useful for developing linear-scaling methods with atom-centered basis functions. Compared to conventional element-by-element sparse matrix multiplication schemes, efficiency is gained by the use of the highly optimized basic linear algebra subroutines (BLAS). However, some sparsity is lost in the multiatom blocking scheme because these matrix blocks will in general contain negligible elements. As a result, an optimal block size that minimizes the CPU time by balancing these two effects is recovered. In calculations on linear alkanes, polyglycines, estane polymers, and water clusters the optimal block size is found to be between 40 and 100 basis functions, where about 55-75% of the machine peak performance was achieved on an IBM RS6000 workstation. In these calculations, the blocked sparse matrix multiplications can be 10 times faster than a standard element-by-element sparse matrix package. Copyright 2003 Wiley Periodicals, Inc. J Comput Chem 24: 618-622, 2003
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Mohammed, Ahmed Ali; Kadiam, Subhash
2010-01-01
Solving large (and sparse) system of simultaneous linear equations has been (and continues to be) a major challenging problem for many real-world engineering/science applications [1-2]. For many practical/large-scale problems, the sparse, Symmetrical and Positive Definite (SPD) system of linear equations can be conveniently represented in matrix notation as [A] {x} = {b} , where the square coefficient matrix [A] and the Right-Hand-Side (RHS) vector {b} are known. The unknown solution vector {x} can be efficiently solved by the following step-by-step procedures [1-2]: Reordering phase, Matrix Factorization phase, Forward solution phase, and Backward solution phase. In this research work, a Game-Based Learning (GBL) approach has been developed to help engineering students to understand crucial details about matrix reordering and factorization phases. A "chess-like" game has been developed and can be played by either a single player, or two players. Through this "chess-like" open-ended game, the players/learners will not only understand the key concepts involved in reordering algorithms (based on existing algorithms), but also have the opportunities to "discover new algorithms" which are better than existing algorithms. Implementing the proposed "chess-like" game for matrix reordering and factorization phases can be enhanced by FLASH [3] computer environments, where computer simulation with animated human voice, sound effects, visual/graphical/colorful displays of matrix tables, score (or monetary) awards for the best game players, etc. can all be exploited. Preliminary demonstrations of the developed GBL approach can be viewed by anyone who has access to the internet web-site [4]!
AZTEC. Parallel Iterative method Software for Solving Linear Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hutchinson, S.; Shadid, J.; Tuminaro, R.
1995-07-01
AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
Disentangling giant component and finite cluster contributions in sparse random matrix spectra.
Kühn, Reimer
2016-04-01
We describe a method for disentangling giant component and finite cluster contributions to sparse random matrix spectra, using sparse symmetric random matrices defined on Erdős-Rényi graphs as an example and test bed. Our methods apply to sparse matrices defined in terms of arbitrary graphs in the configuration model class, as long as they have finite mean degree.
Sparse distributed memory and related models
NASA Technical Reports Server (NTRS)
Kanerva, Pentti
1992-01-01
Described here is sparse distributed memory (SDM) as a neural-net associative memory. It is characterized by two weight matrices and by a large internal dimension - the number of hidden units is much larger than the number of input or output units. The first matrix, A, is fixed and possibly random, and the second matrix, C, is modifiable. The SDM is compared and contrasted to (1) computer memory, (2) correlation-matrix memory, (3) feet-forward artificial neural network, (4) cortex of the cerebellum, (5) Marr and Albus models of the cerebellum, and (6) Albus' cerebellar model arithmetic computer (CMAC). Several variations of the basic SDM design are discussed: the selected-coordinate and hyperplane designs of Jaeckel, the pseudorandom associative neural memory of Hassoun, and SDM with real-valued input variables by Prager and Fallside. SDM research conducted mainly at the Research Institute for Advanced Computer Science (RIACS) in 1986-1991 is highlighted.
Biclustering sparse binary genomic data.
van Uitert, Miranda; Meuleman, Wouter; Wessels, Lodewyk
2008-12-01
Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
NASA Technical Reports Server (NTRS)
Cannone, Jaime J.; Barnes, Cindy L.; Achari, Aniruddha; Kundrot, Craig E.; Whitaker, Ann F. (Technical Monitor)
2001-01-01
The Sparse Matrix approach for obtaining lead crystallization conditions has proven to be very fruitful for the crystallization of proteins and nucleic acids. Here we report a Sparse Matrix developed specifically for the crystallization of protein-DNA complexes. This method is rapid and economical, typically requiring 2.5 mg of complex to test 48 conditions. The method was originally developed to crystallize basic fibroblast growth factor (bFGF) complexed with DNA sequences identified through in vitro selection, or SELEX, methods. Two DNA aptamers that bind with approximately nanomolar affinity and inhibit the angiogenic properties of bFGF were selected for co-crystallization. The Sparse Matrix produced lead crystallization conditions for both bFGF-DNA complexes.
High-SNR spectrum measurement based on Hadamard encoding and sparse reconstruction
NASA Astrophysics Data System (ADS)
Wang, Zhaoxin; Yue, Jiang; Han, Jing; Li, Long; Jin, Yong; Gao, Yuan; Li, Baoming
2017-12-01
The denoising capabilities of the H-matrix and cyclic S-matrix based on the sparse reconstruction, employed in the Pixel of Focal Plane Coded Visible Spectrometer for spectrum measurement are investigated, where the spectrum is sparse in a known basis. In the measurement process, the digital micromirror device plays an important role, which implements the Hadamard coding. In contrast with Hadamard transform spectrometry, based on the shift invariability, this spectrometer may have the advantage of a high efficiency. Simulations and experiments show that the nonlinear solution with a sparse reconstruction has a better signal-to-noise ratio than the linear solution and the H-matrix outperforms the cyclic S-matrix whether the reconstruction method is nonlinear or linear.
Computing row and column counts for sparse QR and LU factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, John R.; Li, Xiaoye S.; Ng, Esmond G.
2001-01-01
We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and theremore » is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.« less
Sparse nonnegative matrix factorization with ℓ0-constraints
Peharz, Robert; Pernkopf, Franz
2012-01-01
Although nonnegative matrix factorization (NMF) favors a sparse and part-based representation of nonnegative data, there is no guarantee for this behavior. Several authors proposed NMF methods which enforce sparseness by constraining or penalizing the ℓ1-norm of the factor matrices. On the other hand, little work has been done using a more natural sparseness measure, the ℓ0-pseudo-norm. In this paper, we propose a framework for approximate NMF which constrains the ℓ0-norm of the basis matrix, or the coefficient matrix, respectively. For this purpose, techniques for unconstrained NMF can be easily incorporated, such as multiplicative update rules, or the alternating nonnegative least-squares scheme. In experiments we demonstrate the benefits of our methods, which compare to, or outperform existing approaches. PMID:22505792
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...
2016-06-30
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
A Shifted Block Lanczos Algorithm 1: The Block Recurrence
NASA Technical Reports Server (NTRS)
Grimes, Roger G.; Lewis, John G.; Simon, Horst D.
1990-01-01
In this paper we describe a block Lanczos algorithm that is used as the key building block of a software package for the extraction of eigenvalues and eigenvectors of large sparse symmetric generalized eigenproblems. The software package comprises: a version of the block Lanczos algorithm specialized for spectrally transformed eigenproblems; an adaptive strategy for choosing shifts, and efficient codes for factoring large sparse symmetric indefinite matrices. This paper describes the algorithmic details of our block Lanczos recurrence. This uses a novel combination of block generalizations of several features that have only been investigated independently in the past. In particular new forms of partial reorthogonalization, selective reorthogonalization and local reorthogonalization are used, as is a new algorithm for obtaining the M-orthogonal factorization of a matrix. The heuristic shifting strategy, the integration with sparse linear equation solvers and numerical experience with the code are described in a companion paper.
Multitasking the Davidson algorithm for the large, sparse eigenvalue problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umar, V.M.; Fischer, C.F.
1989-01-01
The authors report how the Davidson algorithm, developed for handling the eigenvalue problem for large and sparse matrices arising in quantum chemistry, was modified for use in atomic structure calculations. To date these calculations have used traditional eigenvalue methods, which limit the range of feasible calculations because of their excessive memory requirements and unsatisfactory performance attributed to time-consuming and costly processing of zero valued elements. The replacement of a traditional matrix eigenvalue method by the Davidson algorithm reduced these limitations. Significant speedup was found, which varied with the size of the underlying problem and its sparsity. Furthermore, the range ofmore » matrix sizes that can be manipulated efficiently was expended by more than one order or magnitude. On the CRAY X-MP the code was vectorized and the importance of gather/scatter analyzed. A parallelized version of the algorithm obtained an additional 35% reduction in execution time. Speedup due to vectorization and concurrency was also measured on the Alliant FX/8.« less
Three dimensional iterative beam propagation method for optical waveguide devices
NASA Astrophysics Data System (ADS)
Ma, Changbao; Van Keuren, Edward
2006-10-01
The finite difference beam propagation method (FD-BPM) is an effective model for simulating a wide range of optical waveguide structures. The classical FD-BPMs are based on the Crank-Nicholson scheme, and in tridiagonal form can be solved using the Thomas method. We present a different type of algorithm for 3-D structures. In this algorithm, the wave equation is formulated into a large sparse matrix equation which can be solved using iterative methods. The simulation window shifting scheme and threshold technique introduced in our earlier work are utilized to overcome the convergence problem of iterative methods for large sparse matrix equation and wide-angle simulations. This method enables us to develop higher-order 3-D wide-angle (WA-) BPMs based on Pade approximant operators and the multistep method, which are commonly used in WA-BPMs for 2-D structures. Simulations using the new methods will be compared to the analytical results to assure its effectiveness and applicability.
NASA Astrophysics Data System (ADS)
Ma, Sangback
In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
NASA Astrophysics Data System (ADS)
Takasaki, Koichi
This paper presents a program for the multidisciplinary optimization and identification problem of the nonlinear model of large aerospace vehicle structures. The program constructs the global matrix of the dynamic system in the time direction by the p-version finite element method (pFEM), and the basic matrix for each pFEM node in the time direction is described by a sparse matrix similarly to the static finite element problem. The algorithm used by the program does not require the Hessian matrix of the objective function and so has low memory requirements. It also has a relatively low computational cost, and is suited to parallel computation. The program was integrated as a solver module of the multidisciplinary analysis system CUMuLOUS (Computational Utility for Multidisciplinary Large scale Optimization of Undense System) which is under development by the Aerospace Research and Development Directorate (ARD) of the Japan Aerospace Exploration Agency (JAXA).
Parallel pivoting combined with parallel reduction
NASA Technical Reports Server (NTRS)
Alaghband, Gita
1987-01-01
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1991-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. An implementation is presented of a look-ahead version of the Lanczos algorithm that, except for the very special situation of an incurable breakdown, overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and requires the same number of matrix-vector products and inner products as the standard Lanczos process without look-ahead.
Systematic sparse matrix error control for linear scaling electronic structure calculations.
Rubensson, Emanuel H; Sałek, Paweł
2005-11-30
Efficient truncation criteria used in multiatom blocked sparse matrix operations for ab initio calculations are proposed. As system size increases, so does the need to stay on top of errors and still achieve high performance. A variant of a blocked sparse matrix algebra to achieve strict error control with good performance is proposed. The presented idea is that the condition to drop a certain submatrix should depend not only on the magnitude of that particular submatrix, but also on which other submatrices that are dropped. The decision to remove a certain submatrix is based on the contribution the removal would cause to the error in the chosen norm. We study the effect of an accumulated truncation error in iterative algorithms like trace correcting density matrix purification. One way to reduce the initial exponential growth of this error is presented. The presented error control for a sparse blocked matrix toolbox allows for achieving optimal performance by performing only necessary operations needed to maintain the requested level of accuracy. Copyright 2005 Wiley Periodicals, Inc.
Method and apparatus for optimized processing of sparse matrices
Taylor, Valerie E.
1993-01-01
A computer architecture for processing a sparse matrix is disclosed. The apparatus stores a value-row vector corresponding to nonzero values of a sparse matrix. Each of the nonzero values is located at a defined row and column position in the matrix. The value-row vector includes a first vector including nonzero values and delimiting characters indicating a transition from one column to another. The value-row vector also includes a second vector which defines row position values in the matrix corresponding to the nonzero values in the first vector and column position values in the matrix corresponding to the column position of the nonzero values in the first vector. The architecture also includes a circuit for detecting a special character within the value-row vector. Matrix-vector multiplication is executed on the value-row vector. This multiplication is performed by multiplying an index value of the first vector value by a column value from a second matrix to form a matrix-vector product which is added to a previous matrix-vector product.
Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.
2007-01-01
Scattered data interpolation is a problem of interest in numerous areas such as electronic imaging, smooth surface modeling, and computational geometry. Our motivation arises from applications in geology and mining, which often involve large scattered data sets and a demand for high accuracy. The method of choice is ordinary kriging. This is because it is a best unbiased estimator. Unfortunately, this interpolant is computationally very expensive to compute exactly. For n scattered data points, computing the value of a single interpolant involves solving a dense linear system of size roughly n x n. This is infeasible for large n. In practice, kriging is solved approximately by local approaches that are based on considering only a relatively small'number of points that lie close to the query point. There are many problems with this local approach, however. The first is that determining the proper neighborhood size is tricky, and is usually solved by ad hoc methods such as selecting a fixed number of nearest neighbors or all the points lying within a fixed radius. Such fixed neighborhood sizes may not work well for all query points, depending on local density of the point distribution. Local methods also suffer from the problem that the resulting interpolant is not continuous. Meyer showed that while kriging produces smooth continues surfaces, it has zero order continuity along its borders. Thus, at interface boundaries where the neighborhood changes, the interpolant behaves discontinuously. Therefore, it is important to consider and solve the global system for each interpolant. However, solving such large dense systems for each query point is impractical. Recently a more principled approach to approximating kriging has been proposed based on a technique called covariance tapering. The problems arise from the fact that the covariance functions that are used in kriging have global support. Our implementations combine, utilize, and enhance a number of different approaches that have been introduced in literature for solving large linear systems for interpolation of scattered data points. For very large systems, exact methods such as Gaussian elimination are impractical since they require 0(n(exp 3)) time and 0(n(exp 2)) storage. As Billings et al. suggested, we use an iterative approach. In particular, we use the SYMMLQ method, for solving the large but sparse ordinary kriging systems that result from tapering. The main technical issue that need to be overcome in our algorithmic solution is that the points' covariance matrix for kriging should be symmetric positive definite. The goal of tapering is to obtain a sparse approximate representation of the covariance matrix while maintaining its positive definiteness. Furrer et al. used tapering to obtain a sparse linear system of the form Ax = b, where A is the tapered symmetric positive definite covariance matrix. Thus, Cholesky factorization could be used to solve their linear systems. They implemented an efficient sparse Cholesky decomposition method. They also showed if these tapers are used for a limited class of covariance models, the solution of the system converges to the solution of the original system. Matrix A in the ordinary kriging system, while symmetric, is not positive definite. Thus, their approach is not applicable to the ordinary kriging system. Therefore, we use tapering only to obtain a sparse linear system. Then, we use SYMMLQ to solve the ordinary kriging system. We show that solving large kriging systems becomes practical via tapering and iterative methods, and results in lower estimation errors compared to traditional local approaches, and significant memory savings compared to the original global system. We also developed a more efficient variant of the sparse SYMMLQ method for large ordinary kriging systems. This approach adaptively finds the correct local neighborhood for each query point in the interpolation process.
Comparing direct and iterative equation solvers in a large structural analysis software system
NASA Technical Reports Server (NTRS)
Poole, E. L.
1991-01-01
Two direct Choleski equation solvers and two iterative preconditioned conjugate gradient (PCG) equation solvers used in a large structural analysis software system are described. The two direct solvers are implementations of the Choleski method for variable-band matrix storage and sparse matrix storage. The two iterative PCG solvers include the Jacobi conjugate gradient method and an incomplete Choleski conjugate gradient method. The performance of the direct and iterative solvers is compared by solving several representative structural analysis problems. Some key factors affecting the performance of the iterative solvers relative to the direct solvers are identified.
Tang, Xin; Feng, Guo-Can; Li, Xiao-Xin; Cai, Jia-Xin
2015-01-01
Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases.
Tang, Xin; Feng, Guo-can; Li, Xiao-xin; Cai, Jia-xin
2015-01-01
Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases. PMID:26571112
High-dimensional statistical inference: From vector to matrix
NASA Astrophysics Data System (ADS)
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
Cao, Huojun; Amendt, Brad A
2016-11-01
Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016. Published by Elsevier B.V.
Pagès, Hervé
2018-01-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188
Lun, Aaron T L; Pagès, Hervé; Smith, Mike L
2018-05-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
NASA Astrophysics Data System (ADS)
Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico
2018-04-01
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
Decentralized state estimation for a large-scale spatially interconnected system.
Liu, Huabo; Yu, Haisheng
2018-03-01
A decentralized state estimator is derived for the spatially interconnected systems composed of many subsystems with arbitrary connection relations. An optimization problem on the basis of linear matrix inequality (LMI) is constructed for the computations of improved subsystem parameter matrices. Several computationally effective approaches are derived which efficiently utilize the block-diagonal characteristic of system parameter matrices and the sparseness of subsystem connection matrix. Moreover, this decentralized state estimator is proved to converge to a stable system and obtain a bounded covariance matrix of estimation errors under certain conditions. Numerical simulations show that the obtained decentralized state estimator is attractive in the synthesis of a large-scale networked system. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...
2017-06-01
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel
As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
FPGA implementation of sparse matrix algorithm for information retrieval
NASA Astrophysics Data System (ADS)
Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio
2005-06-01
Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
1-norm support vector novelty detection and its sparseness.
Zhang, Li; Zhou, WeiDa
2013-12-01
This paper proposes a 1-norm support vector novelty detection (SVND) method and discusses its sparseness. 1-norm SVND is formulated as a linear programming problem and uses two techniques for inducing sparseness, or the 1-norm regularization and the hinge loss function. We also find two upper bounds on the sparseness of 1-norm SVND, or exact support vector (ESV) and kernel Gram matrix rank bounds. The ESV bound indicates that 1-norm SVND has a sparser representation model than SVND. The kernel Gram matrix rank bound can loosely estimate the sparseness of 1-norm SVND. Experimental results show that 1-norm SVND is feasible and effective. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Stabilized Sparse-Matrix U-D Square-Root Implementation of a Large-State Extended Kalman Filter
NASA Technical Reports Server (NTRS)
Boggs, D.; Ghil, M.; Keppenne, C.
1995-01-01
The full nonlinear Kalman filter sequential algorithm is, in theory, well-suited to the four-dimensional data assimilation problem in large-scale atmospheric and oceanic problems. However, it was later discovered that this algorithm can be very sensitive to computer roundoff, and that results may cease to be meaningful as time advances. Implementations of a modified Kalman filter are given.
Stability and stabilisation of a class of networked dynamic systems
NASA Astrophysics Data System (ADS)
Liu, H. B.; Wang, D. Q.
2018-04-01
We investigate the stability and stabilisation of a linear time invariant networked heterogeneous system with arbitrarily connected subsystems. A new linear matrix inequality based sufficient and necessary condition for the stability is derived, based on which the stabilisation is provided. The obtained conditions efficiently utilise the block-diagonal characteristic of system parameter matrices and the sparseness of subsystem connection matrix. Moreover, a sufficient condition only dependent on each individual subsystem is also presented for the stabilisation of the networked systems with a large scale. Numerical simulations show that these conditions are computationally valid in the analysis and synthesis of a large-scale networked system.
Improved Estimation and Interpretation of Correlations in Neural Circuits
Yatsenko, Dimitri; Josić, Krešimir; Ecker, Alexander S.; Froudarakis, Emmanouil; Cotton, R. James; Tolias, Andreas S.
2015-01-01
Ambitious projects aim to record the activity of ever larger and denser neuronal populations in vivo. Correlations in neural activity measured in such recordings can reveal important aspects of neural circuit organization. However, estimating and interpreting large correlation matrices is statistically challenging. Estimation can be improved by regularization, i.e. by imposing a structure on the estimate. The amount of improvement depends on how closely the assumed structure represents dependencies in the data. Therefore, the selection of the most efficient correlation matrix estimator for a given neural circuit must be determined empirically. Importantly, the identity and structure of the most efficient estimator informs about the types of dominant dependencies governing the system. We sought statistically efficient estimators of neural correlation matrices in recordings from large, dense groups of cortical neurons. Using fast 3D random-access laser scanning microscopy of calcium signals, we recorded the activity of nearly every neuron in volumes 200 μm wide and 100 μm deep (150–350 cells) in mouse visual cortex. We hypothesized that in these densely sampled recordings, the correlation matrix should be best modeled as the combination of a sparse graph of pairwise partial correlations representing local interactions and a low-rank component representing common fluctuations and external inputs. Indeed, in cross-validation tests, the covariance matrix estimator with this structure consistently outperformed other regularized estimators. The sparse component of the estimate defined a graph of interactions. These interactions reflected the physical distances and orientation tuning properties of cells: The density of positive ‘excitatory’ interactions decreased rapidly with geometric distances and with differences in orientation preference whereas negative ‘inhibitory’ interactions were less selective. Because of its superior performance, this ‘sparse+latent’ estimator likely provides a more physiologically relevant representation of the functional connectivity in densely sampled recordings than the sample correlation matrix. PMID:25826696
Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications
Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian
2016-01-01
How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406
Fabric defect detection based on visual saliency using deep feature and low-rank recovery
NASA Astrophysics Data System (ADS)
Liu, Zhoufeng; Wang, Baorui; Li, Chunlei; Li, Bicao; Dong, Yan
2018-04-01
Fabric defect detection plays an important role in improving the quality of fabric product. In this paper, a novel fabric defect detection method based on visual saliency using deep feature and low-rank recovery was proposed. First, unsupervised training is carried out by the initial network parameters based on MNIST large datasets. The supervised fine-tuning of fabric image library based on Convolutional Neural Networks (CNNs) is implemented, and then more accurate deep neural network model is generated. Second, the fabric images are uniformly divided into the image block with the same size, then we extract their multi-layer deep features using the trained deep network. Thereafter, all the extracted features are concentrated into a feature matrix. Third, low-rank matrix recovery is adopted to divide the feature matrix into the low-rank matrix which indicates the background and the sparse matrix which indicates the salient defect. In the end, the iterative optimal threshold segmentation algorithm is utilized to segment the saliency maps generated by the sparse matrix to locate the fabric defect area. Experimental results demonstrate that the feature extracted by CNN is more suitable for characterizing the fabric texture than the traditional LBP, HOG and other hand-crafted features extraction method, and the proposed method can accurately detect the defect regions of various fabric defects, even for the image with complex texture.
Communication Optimal Parallel Multiplication of Sparse Random Matrices
2013-02-21
Definition 2.1), and (2) the algorithm is sparsity- independent, where the computation is statically partitioned to processors independent of the sparsity...struc- ture of the input matrices (see Definition 2.5). The second assumption applies to nearly all existing al- gorithms for general sparse matrix-matrix...where A and B are n× n ER(d) matrices: Definition 2.1 An ER(d) matrix is an adjacency matrix of an Erdős-Rényi graph with parameters n and d/n. That
Convergence Speed of a Dynamical System for Sparse Recovery
NASA Astrophysics Data System (ADS)
Balavoine, Aurele; Rozell, Christopher J.; Romberg, Justin
2013-09-01
This paper studies the convergence rate of a continuous-time dynamical system for L1-minimization, known as the Locally Competitive Algorithm (LCA). Solving L1-minimization} problems efficiently and rapidly is of great interest to the signal processing community, as these programs have been shown to recover sparse solutions to underdetermined systems of linear equations and come with strong performance guarantees. The LCA under study differs from the typical L1 solver in that it operates in continuous time: instead of being specified by discrete iterations, it evolves according to a system of nonlinear ordinary differential equations. The LCA is constructed from simple components, giving it the potential to be implemented as a large-scale analog circuit. The goal of this paper is to give guarantees on the convergence time of the LCA system. To do so, we analyze how the LCA evolves as it is recovering a sparse signal from underdetermined measurements. We show that under appropriate conditions on the measurement matrix and the problem parameters, the path the LCA follows can be described as a sequence of linear differential equations, each with a small number of active variables. This allows us to relate the convergence time of the system to the restricted isometry constant of the matrix. Interesting parallels to sparse-recovery digital solvers emerge from this study. Our analysis covers both the noisy and noiseless settings and is supported by simulation results.
NASA Technical Reports Server (NTRS)
Szyld, D. B.
1984-01-01
A brief description of the Model of the World Economy implemented at the Institute for Economic Analysis is presented, together with our experience in converting the software to vector code. For each time period, the model is reduced to a linear system of over 2000 variables. The matrix of coefficients has a bordered block diagonal structure, and we show how some of the matrix operations can be carried out on all diagonal blocks at once.
Thakur, Anil S.; Robin, Gautier; Guncar, Gregor; Saunders, Neil F. W.; Newman, Janet; Martin, Jennifer L.; Kobe, Bostjan
2007-01-01
Background Crystallization is a major bottleneck in the process of macromolecular structure determination by X-ray crystallography. Successful crystallization requires the formation of nuclei and their subsequent growth to crystals of suitable size. Crystal growth generally occurs spontaneously in a supersaturated solution as a result of homogenous nucleation. However, in a typical sparse matrix screening experiment, precipitant and protein concentration are not sampled extensively, and supersaturation conditions suitable for nucleation are often missed. Methodology/Principal Findings We tested the effect of nine potential heterogenous nucleating agents on crystallization of ten test proteins in a sparse matrix screen. Several nucleating agents induced crystal formation under conditions where no crystallization occurred in the absence of the nucleating agent. Four nucleating agents: dried seaweed; horse hair; cellulose and hydroxyapatite, had a considerable overall positive effect on crystallization success. This effect was further enhanced when these nucleating agents were used in combination with each other. Conclusions/Significance Our results suggest that the addition of heterogeneous nucleating agents increases the chances of crystal formation when using sparse matrix screens. PMID:17971854
A sparse matrix algorithm on the Boolean vector machine
NASA Technical Reports Server (NTRS)
Wagner, Robert A.; Patrick, Merrell L.
1988-01-01
VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
Acceleration of GPU-based Krylov solvers via data transfer reduction
Anzt, Hartwig; Tomov, Stanimire; Luszczek, Piotr; ...
2015-04-08
Krylov subspace iterative solvers are often the method of choice when solving large sparse linear systems. At the same time, hardware accelerators such as graphics processing units continue to offer significant floating point performance gains for matrix and vector computations through easy-to-use libraries of computational kernels. However, as these libraries are usually composed of a well optimized but limited set of linear algebra operations, applications that use them often fail to reduce certain data communications, and hence fail to leverage the full potential of the accelerator. In this study, we target the acceleration of Krylov subspace iterative methods for graphicsmore » processing units, and in particular the Biconjugate Gradient Stabilized solver that significant improvement can be achieved by reformulating the method to reduce data-communications through application-specific kernels instead of using the generic BLAS kernels, e.g. as provided by NVIDIA’s cuBLAS library, and by designing a graphics processing unit specific sparse matrix-vector product kernel that is able to more efficiently use the graphics processing unit’s computing power. Furthermore, we derive a model estimating the performance improvement, and use experimental data to validate the expected runtime savings. Finally, considering that the derived implementation achieves significantly higher performance, we assert that similar optimizations addressing algorithm structure, as well as sparse matrix-vector, are crucial for the subsequent development of high-performance graphics processing units accelerated Krylov subspace iterative methods.« less
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory
NASA Astrophysics Data System (ADS)
Challacombe, Matt
2000-06-01
A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
NASA Astrophysics Data System (ADS)
Bustamam, A.; Ulul, E. D.; Hura, H. F. A.; Siswantining, T.
2017-07-01
Hierarchical clustering is one of effective methods in creating a phylogenetic tree based on the distance matrix between DNA (deoxyribonucleic acid) sequences. One of the well-known methods to calculate the distance matrix is k-mer method. Generally, k-mer is more efficient than some distance matrix calculation techniques. The steps of k-mer method are started from creating k-mer sparse matrix, and followed by creating k-mer singular value vectors. The last step is computing the distance amongst vectors. In this paper, we analyze the sequences of MERS-CoV (Middle East Respiratory Syndrome - Coronavirus) DNA by implementing hierarchical clustering using k-mer sparse matrix in order to perform the phylogenetic analysis. Our results show that the ancestor of our MERS-CoV is coming from Egypt. Moreover, we found that the MERS-CoV infection that occurs in one country may not necessarily come from the same country of origin. This suggests that the process of MERS-CoV mutation might not only be influenced by geographical factor.
Sparse matrix-vector multiplication on network-on-chip
NASA Astrophysics Data System (ADS)
Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.
2010-12-01
In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
NASA Astrophysics Data System (ADS)
Sides, Scott; Jamroz, Ben; Crockett, Robert; Pletzer, Alexander
2012-02-01
Self-consistent field theory (SCFT) for dense polymer melts has been highly successful in describing complex morphologies in block copolymers. Field-theoretic simulations such as these are able to access large length and time scales that are difficult or impossible for particle-based simulations such as molecular dynamics. The modified diffusion equations that arise as a consequence of the coarse-graining procedure in the SCF theory can be efficiently solved with a pseudo-spectral (PS) method that uses fast-Fourier transforms on uniform Cartesian grids. However, PS methods can be difficult to apply in many block copolymer SCFT simulations (eg. confinement, interface adsorption) in which small spatial regions might require finer resolution than most of the simulation grid. Progress on using new solver algorithms to address these problems will be presented. The Tech-X Chompst project aims at marrying the best of adaptive mesh refinement with linear matrix solver algorithms. The Tech-X code PolySwift++ is an SCFT simulation platform that leverages ongoing development in coupling Chombo, a package for solving PDEs via block-structured AMR calculations and embedded boundaries, with PETSc, a toolkit that includes a large assortment of sparse linear solvers.
Fast iterative image reconstruction using sparse matrix factorization with GPU acceleration
NASA Astrophysics Data System (ADS)
Zhou, Jian; Qi, Jinyi
2011-03-01
Statistically based iterative approaches for image reconstruction have gained much attention in medical imaging. An accurate system matrix that defines the mapping from the image space to the data space is the key to high-resolution image reconstruction. However, an accurate system matrix is often associated with high computational cost and huge storage requirement. Here we present a method to address this problem by using sparse matrix factorization and parallel computing on a graphic processing unit (GPU).We factor the accurate system matrix into three sparse matrices: a sinogram blurring matrix, a geometric projection matrix, and an image blurring matrix. The sinogram blurring matrix models the detector response. The geometric projection matrix is based on a simple line integral model. The image blurring matrix is to compensate for the line-of-response (LOR) degradation due to the simplified geometric projection matrix. The geometric projection matrix is precomputed, while the sinogram and image blurring matrices are estimated by minimizing the difference between the factored system matrix and the original system matrix. The resulting factored system matrix has much less number of nonzero elements than the original system matrix and thus substantially reduces the storage and computation cost. The smaller size also allows an efficient implement of the forward and back projectors on GPUs, which have limited amount of memory. Our simulation studies show that the proposed method can dramatically reduce the computation cost of high-resolution iterative image reconstruction. The proposed technique is applicable to image reconstruction for different imaging modalities, including x-ray CT, PET, and SPECT.
Progress on a generalized coordinates tensor product finite element 3DPNS algorithm for subsonic
NASA Technical Reports Server (NTRS)
Baker, A. J.; Orzechowski, J. A.
1983-01-01
A generalized coordinates form of the penalty finite element algorithm for the 3-dimensional parabolic Navier-Stokes equations for turbulent subsonic flows was derived. This algorithm formulation requires only three distinct hypermatrices and is applicable using any boundary fitted coordinate transformation procedure. The tensor matrix product approximation to the Jacobian of the Newton linear algebra matrix statement was also derived. Tne Newton algorithm was restructured to replace large sparse matrix solution procedures with grid sweeping using alpha-block tridiagonal matrices, where alpha equals the number of dependent variables. Numerical experiments were conducted and the resultant data gives guidance on potentially preferred tensor product constructions for the penalty finite element 3DPNS algorithm.
Representation-Independent Iteration of Sparse Data Arrays
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
An approach is defined that describes a method of iterating over massively large arrays containing sparse data using an approach that is implementation independent of how the contents of the sparse arrays are laid out in memory. What is unique and important here is the decoupling of the iteration over the sparse set of array elements from how they are internally represented in memory. This enables this approach to be backward compatible with existing schemes for representing sparse arrays as well as new approaches. What is novel here is a new approach for efficiently iterating over sparse arrays that is independent of the underlying memory layout representation of the array. A functional interface is defined for implementing sparse arrays in any modern programming language with a particular focus for the Chapel programming language. Examples are provided that show the translation of a loop that computes a matrix vector product into this representation for both the distributed and not-distributed cases. This work is directly applicable to NASA and its High Productivity Computing Systems (HPCS) program that JPL and our current program are engaged in. The goal of this program is to create powerful, scalable, and economically viable high-powered computer systems suitable for use in national security and industry by 2010. This is important to NASA for its computationally intensive requirements for analyzing and understanding the volumes of science data from our returned missions.
Multiprocessor sparse L/U decomposition with controlled fill-in
NASA Technical Reports Server (NTRS)
Alaghband, G.; Jordan, H. F.
1985-01-01
Generation of the maximal compatibles of pivot elements for a class of small sparse matrices is studied. The algorithm involves a binary tree search and has a complexity exponential in the order of the matrix. Different strategies for selection of a set of compatible pivots based on the Markowitz criterion are investigated. The competing issues of parallelism and fill-in generation are studied and results are provided. A technque for obtaining an ordered compatible set directly from the ordered incompatible table is given. This technique generates a set of compatible pivots with the property of generating few fills. A new hueristic algorithm is then proposed that combines the idea of an ordered compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. Finally, an elimination set to reduce the matrix is selected. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices are presented and analyzed.
Signal Sampling for Efficient Sparse Representation of Resting State FMRI Data
Ge, Bao; Makkie, Milad; Wang, Jin; Zhao, Shijie; Jiang, Xi; Li, Xiang; Lv, Jinglei; Zhang, Shu; Zhang, Wei; Han, Junwei; Guo, Lei; Liu, Tianming
2015-01-01
As the size of brain imaging data such as fMRI grows explosively, it provides us with unprecedented and abundant information about the brain. How to reduce the size of fMRI data but not lose much information becomes a more and more pressing issue. Recent literature studies tried to deal with it by dictionary learning and sparse representation methods, however, their computation complexities are still high, which hampers the wider application of sparse representation method to large scale fMRI datasets. To effectively address this problem, this work proposes to represent resting state fMRI (rs-fMRI) signals of a whole brain via a statistical sampling based sparse representation. First we sampled the whole brain’s signals via different sampling methods, then the sampled signals were aggregate into an input data matrix to learn a dictionary, finally this dictionary was used to sparsely represent the whole brain’s signals and identify the resting state networks. Comparative experiments demonstrate that the proposed signal sampling framework can speed-up by ten times in reconstructing concurrent brain networks without losing much information. The experiments on the 1000 Functional Connectomes Project further demonstrate its effectiveness and superiority. PMID:26646924
NASA Astrophysics Data System (ADS)
Chang, Yong; Zi, Yanyang; Zhao, Jiyuan; Yang, Zhe; He, Wangpeng; Sun, Hailiang
2017-03-01
In guided wave pipeline inspection, echoes reflected from closely spaced reflectors generally overlap, meaning useful information is lost. To solve the overlapping problem, sparse deconvolution methods have been developed in the past decade. However, conventional sparse deconvolution methods have limitations in handling guided wave signals, because the input signal is directly used as the prototype of the convolution matrix, without considering the waveform change caused by the dispersion properties of the guided wave. In this paper, an adaptive sparse deconvolution (ASD) method is proposed to overcome these limitations. First, the Gaussian echo model is employed to adaptively estimate the column prototype of the convolution matrix instead of directly using the input signal as the prototype. Then, the convolution matrix is constructed upon the estimated results. Third, the split augmented Lagrangian shrinkage (SALSA) algorithm is introduced to solve the deconvolution problem with high computational efficiency. To verify the effectiveness of the proposed method, guided wave signals obtained from pipeline inspection are investigated numerically and experimentally. Compared to conventional sparse deconvolution methods, e.g. the {{l}1} -norm deconvolution method, the proposed method shows better performance in handling the echo overlap problem in the guided wave signal.
NASA Astrophysics Data System (ADS)
Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario
2018-04-01
Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.
Sparse PCA with Oracle Property.
Gu, Quanquan; Wang, Zhaoran; Liu, Han
In this paper, we study the estimation of the k -dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank- k , and attains a [Formula: see text] statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets.
Sparse PCA with Oracle Property
Gu, Quanquan; Wang, Zhaoran; Liu, Han
2014-01-01
In this paper, we study the estimation of the k-dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank-k, and attains a s/n statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets. PMID:25684971
NASA Technical Reports Server (NTRS)
Buchholz, Peter; Ciardo, Gianfranco; Donatelli, Susanna; Kemper, Peter
1997-01-01
We present a systematic discussion of algorithms to multiply a vector by a matrix expressed as the Kronecker product of sparse matrices, extending previous work in a unified notational framework. Then, we use our results to define new algorithms for the solution of large structured Markov models. In addition to a comprehensive overview of existing approaches, we give new results with respect to: (1) managing certain types of state-dependent behavior without incurring extra cost; (2) supporting both Jacobi-style and Gauss-Seidel-style methods by appropriate multiplication algorithms; (3) speeding up algorithms that consider probability vectors of size equal to the "actual" state space instead of the "potential" state space.
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance
NASA Technical Reports Server (NTRS)
Venugopal, Sesh; Naik, Vijay K.
1991-01-01
A block based, automatic partitioning and scheduling methodology is presented for sparse matrix factorization on distributed memory systems. Using experimental results, this technique is analyzed for communication and load imbalance overhead. To study the performance effects, these overheads were compared with those obtained from a straightforward 'wrap mapped' column assignment scheme. All experimental results were obtained using test sparse matrices from the Harwell-Boeing data set. The results show that there is a communication and load balance tradeoff. The block based method results in lower communication cost whereas the wrap mapped scheme gives better load balance.
Enforced Sparse Non-Negative Matrix Factorization
2016-01-23
documents to find interesting pieces of information. With limited resources, analysts often employ automated text - mining tools that highlight common...represented as an undirected bipartite graph. It has become a common method for generating topic models of text data because it is known to produce good results...model and the convergence rate of the underlying algorithm. I. Introduction A common analyst challenge is searching through large quantities of text
Archer, A.W.; Maples, C.G.
1989-01-01
Numerous departures from ideal relationships are revealed by Monte Carlo simulations of widely accepted binomial coefficients. For example, simulations incorporating varying levels of matrix sparseness (presence of zeros indicating lack of data) and computation of expected values reveal that not only are all common coefficients influenced by zero data, but also that some coefficients do not discriminate between sparse or dense matrices (few zero data). Such coefficients computationally merge mutually shared and mutually absent information and do not exploit all the information incorporated within the standard 2 ?? 2 contingency table; therefore, the commonly used formulae for such coefficients are more complicated than the actual range of values produced. Other coefficients do differentiate between mutual presences and absences; however, a number of these coefficients do not demonstrate a linear relationship to matrix sparseness. Finally, simulations using nonrandom matrices with known degrees of row-by-row similarities signify that several coefficients either do not display a reasonable range of values or are nonlinear with respect to known relationships within the data. Analyses with nonrandom matrices yield clues as to the utility of certain coefficients for specific applications. For example, coefficients such as Jaccard, Dice, and Baroni-Urbani and Buser are useful if correction of sparseness is desired, whereas the Russell-Rao coefficient is useful when sparseness correction is not desired. ?? 1989 International Association for Mathematical Geology.
Sparse representation of whole-brain fMRI signals for identification of functional networks.
Lv, Jinglei; Jiang, Xi; Li, Xiang; Zhu, Dajiang; Chen, Hanbo; Zhang, Tuo; Zhang, Shu; Hu, Xintao; Han, Junwei; Huang, Heng; Zhang, Jing; Guo, Lei; Liu, Tianming
2015-02-01
There have been several recent studies that used sparse representation for fMRI signal analysis and activation detection based on the assumption that each voxel's fMRI signal is linearly composed of sparse components. Previous studies have employed sparse coding to model functional networks in various modalities and scales. These prior contributions inspired the exploration of whether/how sparse representation can be used to identify functional networks in a voxel-wise way and on the whole brain scale. This paper presents a novel, alternative methodology of identifying multiple functional networks via sparse representation of whole-brain task-based fMRI signals. Our basic idea is that all fMRI signals within the whole brain of one subject are aggregated into a big data matrix, which is then factorized into an over-complete dictionary basis matrix and a reference weight matrix via an effective online dictionary learning algorithm. Our extensive experimental results have shown that this novel methodology can uncover multiple functional networks that can be well characterized and interpreted in spatial, temporal and frequency domains based on current brain science knowledge. Importantly, these well-characterized functional network components are quite reproducible in different brains. In general, our methods offer a novel, effective and unified solution to multiple fMRI data analysis tasks including activation detection, de-activation detection, and functional network identification. Copyright © 2014 Elsevier B.V. All rights reserved.
Salient Object Detection via Structured Matrix Decomposition.
Peng, Houwen; Li, Bing; Ling, Haibin; Hu, Weiming; Xiong, Weihua; Maybank, Stephen J
2016-05-04
Low-rank recovery models have shown potential for salient object detection, where a matrix is decomposed into a low-rank matrix representing image background and a sparse matrix identifying salient objects. Two deficiencies, however, still exist. First, previous work typically assumes the elements in the sparse matrix are mutually independent, ignoring the spatial and pattern relations of image regions. Second, when the low-rank and sparse matrices are relatively coherent, e.g., when there are similarities between the salient objects and background or when the background is complicated, it is difficult for previous models to disentangle them. To address these problems, we propose a novel structured matrix decomposition model with two structural regularizations: (1) a tree-structured sparsity-inducing regularization that captures the image structure and enforces patches from the same object to have similar saliency values, and (2) a Laplacian regularization that enlarges the gaps between salient objects and the background in feature space. Furthermore, high-level priors are integrated to guide the matrix decomposition and boost the detection. We evaluate our model for salient object detection on five challenging datasets including single object, multiple objects and complex scene images, and show competitive results as compared with 24 state-of-the-art methods in terms of seven performance metrics.
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.
2001-01-01
An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.
Visual Tracking Based on Extreme Learning Machine and Sparse Representation
Wang, Baoxian; Tang, Linbo; Yang, Jinglin; Zhao, Baojun; Wang, Shuigen
2015-01-01
The existing sparse representation-based visual trackers mostly suffer from both being time consuming and having poor robustness problems. To address these issues, a novel tracking method is presented via combining sparse representation and an emerging learning technique, namely extreme learning machine (ELM). Specifically, visual tracking can be divided into two consecutive processes. Firstly, ELM is utilized to find the optimal separate hyperplane between the target observations and background ones. Thus, the trained ELM classification function is able to remove most of the candidate samples related to background contents efficiently, thereby reducing the total computational cost of the following sparse representation. Secondly, to further combine ELM and sparse representation, the resultant confidence values (i.e., probabilities to be a target) of samples on the ELM classification function are used to construct a new manifold learning constraint term of the sparse representation framework, which tends to achieve robuster results. Moreover, the accelerated proximal gradient method is used for deriving the optimal solution (in matrix form) of the constrained sparse tracking model. Additionally, the matrix form solution allows the candidate samples to be calculated in parallel, thereby leading to a higher efficiency. Experiments demonstrate the effectiveness of the proposed tracker. PMID:26506359
NASA Astrophysics Data System (ADS)
Qin, Xulei; Cong, Zhibin; Halig, Luma V.; Fei, Baowei
2013-03-01
An automatic framework is proposed to segment right ventricle on ultrasound images. This method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform (SMT), a training model, and a localized region based level set. First, the sparse matrix transform extracts main motion regions of myocardium as eigenimages by analyzing statistical information of these images. Second, a training model of right ventricle is registered to the extracted eigenimages in order to automatically detect the main location of the right ventricle and the corresponding transform relationship between the training model and the SMT-extracted results in the series. Third, the training model is then adjusted as an adapted initialization for the segmentation of each image in the series. Finally, based on the adapted initializations, a localized region based level set algorithm is applied to segment both epicardial and endocardial boundaries of the right ventricle from the whole series. Experimental results from real subject data validated the performance of the proposed framework in segmenting right ventricle from echocardiography. The mean Dice scores for both epicardial and endocardial boundaries are 89.1%+/-2.3% and 83.6+/-7.3%, respectively. The automatic segmentation method based on sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.
Wang, Ya-Xuan; Gao, Ying-Lian; Liu, Jin-Xing; Kong, Xiang-Zhen; Li, Hai-Jun
2017-09-01
Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.
Sparse Regression as a Sparse Eigenvalue Problem
NASA Technical Reports Server (NTRS)
Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai
2008-01-01
We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION
Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong
2015-01-01
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Zhang, Shu; Li, Xiang; Lv, Jinglei; Jiang, Xi; Guo, Lei; Liu, Tianming
2016-03-01
A relatively underexplored question in fMRI is whether there are intrinsic differences in terms of signal composition patterns that can effectively characterize and differentiate task-based or resting state fMRI (tfMRI or rsfMRI) signals. In this paper, we propose a novel two-stage sparse representation framework to examine the fundamental difference between tfMRI and rsfMRI signals. Specifically, in the first stage, the whole-brain tfMRI or rsfMRI signals of each subject were composed into a big data matrix, which was then factorized into a subject-specific dictionary matrix and a weight coefficient matrix for sparse representation. In the second stage, all of the dictionary matrices from both tfMRI/rsfMRI data across multiple subjects were composed into another big data-matrix, which was further sparsely represented by a cross-subjects common dictionary and a weight matrix. This framework has been applied on the recently publicly released Human Connectome Project (HCP) fMRI data and experimental results revealed that there are distinctive and descriptive atoms in the cross-subjects common dictionary that can effectively characterize and differentiate tfMRI and rsfMRI signals, achieving 100% classification accuracy. Moreover, our methods and results can be meaningfully interpreted, e.g., the well-known default mode network (DMN) activities can be recovered from the very noisy and heterogeneous aggregated big-data of tfMRI and rsfMRI signals across all subjects in HCP Q1 release.
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Computing sparse derivatives and consecutive zeros problem
NASA Astrophysics Data System (ADS)
Chandra, B. V. Ravi; Hossain, Shahadat
2013-02-01
We describe a substitution based sparse Jacobian matrix determination method using algorithmic differentiation. Utilizing the a priori known sparsity pattern, a compression scheme is determined using graph coloring. The "compressed pattern" of the Jacobian matrix is then reordered into a form suitable for computation by substitution. We show that the column reordering of the compressed pattern matrix (so as to align the zero entries into consecutive locations in each row) can be viewed as a variant of traveling salesman problem. Preliminary computational results show that on the test problems the performance of nearest-neighbor type heuristic algorithms is highly encouraging.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
B. Hendrickson; T.G. Kolda
1998-09-01
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Sparse matrix methods research using the CSM testbed software system
NASA Technical Reports Server (NTRS)
Chu, Eleanor; George, J. Alan
1989-01-01
Research is described on sparse matrix techniques for the Computational Structural Mechanics (CSM) Testbed. The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the CSM Testbed. Thus, one of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A suite of subroutines to extract from the data base the relevant structural and numerical information about the matrix equations was written, and all the demonstration problems distributed with the testbed were successfully solved. These codes were documented, and performance studies comparing the SPARSPAK technology to the methods currently in the testbed were completed. In addition, some preliminary studies were done comparing some recently developed out-of-core techniques with the performance of the testbed processor INV.
A tight and explicit representation of Q in sparse QR factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ng, E.G.; Peyton, B.W.
1992-05-01
In QR factorization of a sparse m{times}n matrix A (m {ge} n) the orthogonal factor Q is often stored implicitly as a lower trapezoidal matrix H known as the Householder matrix. This paper presents a simple characterization of the row structure of Q, which could be used as the basis for a sparse data structure that can store Q explicitly. The new characterization is a simple extension of a well known row-oriented characterization of the structure of H. Hare, Johnson, Olesky, and van den Driessche have recently provided a complete sparsity analysis of the QR factorization. Let U be themore » matrix consisting of the first n columns of Q. Using results from, we show that the data structures for H and U resulting from our characterizations are tight when A is a strong Hall matrix. We also show that H and the lower trapezoidal part of U have the same sparsity characterization when A is strong Hall. We then show that this characterization can be extended to any weak Hall matrix that has been permuted into block upper triangular form. Finally, we show that permuting to block triangular form never increases the fill incurred during the factorization.« less
Xuan, Junyu; Lu, Jie; Zhang, Guangquan; Xu, Richard Yi Da; Luo, Xiangfeng
2018-05-01
Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.
Social Collaborative Filtering by Trust.
Yang, Bo; Lei, Yu; Liu, Jiming; Li, Wenjie
2017-08-01
Recommender systems are used to accurately and actively provide users with potentially interesting information or services. Collaborative filtering is a widely adopted approach to recommendation, but sparse data and cold-start users are often barriers to providing high quality recommendations. To address such issues, we propose a novel method that works to improve the performance of collaborative filtering recommendations by integrating sparse rating data given by users and sparse social trust network among these same users. This is a model-based method that adopts matrix factorization technique that maps users into low-dimensional latent feature spaces in terms of their trust relationship, and aims to more accurately reflect the users reciprocal influence on the formation of their own opinions and to learn better preferential patterns of users for high-quality recommendations. We use four large-scale datasets to show that the proposed method performs much better, especially for cold start users, than state-of-the-art recommendation algorithms for social collaborative filtering based on trust.
Experiments with conjugate gradient algorithms for homotopy curve tracking
NASA Technical Reports Server (NTRS)
Irani, Kashmira M.; Ribbens, Calvin J.; Watson, Layne T.; Kamat, Manohar P.; Walker, Homer F.
1991-01-01
There are algorithms for finding zeros or fixed points of nonlinear systems of equations that are globally convergent for almost all starting points, i.e., with probability one. The essence of all such algorithms is the construction of an appropriate homotopy map and then tracking some smooth curve in the zero set of this homotopy map. HOMPACK is a mathematical software package implementing globally convergent homotopy algorithms with three different techniques for tracking a homotopy zero curve, and has separate routines for dense and sparse Jacobian matrices. The HOMPACK algorithms for sparse Jacobian matrices use a preconditioned conjugate gradient algorithm for the computation of the kernel of the homotopy Jacobian matrix, a required linear algebra step for homotopy curve tracking. Here, variants of the conjugate gradient algorithm are implemented in the context of homotopy curve tracking and compared with Craig's preconditioned conjugate gradient method used in HOMPACK. The test problems used include actual large scale, sparse structural mechanics problems.
Vecharynski, Eugene; Yang, Chao; Pask, John E.
2015-02-25
Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
NASA Astrophysics Data System (ADS)
Elkurdi, Yousef; Fernández, David; Souleimanov, Evgueni; Giannacopoulos, Dennis; Gross, Warren J.
2008-04-01
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.
NASA Astrophysics Data System (ADS)
Liu, Yang; Li, Feng; Xin, Lei; Fu, Jie; Huang, Puming
2017-10-01
Large amount of data is one of the most obvious features in satellite based remote sensing systems, which is also a burden for data processing and transmission. The theory of compressive sensing(CS) has been proposed for almost a decade, and massive experiments show that CS has favorable performance in data compression and recovery, so we apply CS theory to remote sensing images acquisition. In CS, the construction of classical sensing matrix for all sparse signals has to satisfy the Restricted Isometry Property (RIP) strictly, which limits applying CS in practical in image compression. While for remote sensing images, we know some inherent characteristics such as non-negative, smoothness and etc.. Therefore, the goal of this paper is to present a novel measurement matrix that breaks RIP. The new sensing matrix consists of two parts: the standard Nyquist sampling matrix for thumbnails and the conventional CS sampling matrix. Since most of sun-synchronous based satellites fly around the earth 90 minutes and the revisit cycle is also short, lots of previously captured remote sensing images of the same place are available in advance. This drives us to reconstruct remote sensing images through a deep learning approach with those measurements from the new framework. Therefore, we propose a novel deep convolutional neural network (CNN) architecture which takes in undersampsing measurements as input and outputs an intermediate reconstruction image. It is well known that the training procedure to the network costs long time, luckily, the training step can be done only once, which makes the approach attractive for a host of sparse recovery problems.
Addressing the computational cost of large EIT solutions.
Boyle, Alistair; Borsic, Andrea; Adler, Andy
2012-05-01
Electrical impedance tomography (EIT) is a soft field tomography modality based on the application of electric current to a body and measurement of voltages through electrodes at the boundary. The interior conductivity is reconstructed on a discrete representation of the domain using a finite-element method (FEM) mesh and a parametrization of that domain. The reconstruction requires a sequence of numerically intensive calculations. There is strong interest in reducing the cost of these calculations. An improvement in the compute time for current problems would encourage further exploration of computationally challenging problems such as the incorporation of time series data, wide-spread adoption of three-dimensional simulations and correlation of other modalities such as CT and ultrasound. Multicore processors offer an opportunity to reduce EIT computation times but may require some restructuring of the underlying algorithms to maximize the use of available resources. This work profiles two EIT software packages (EIDORS and NDRM) to experimentally determine where the computational costs arise in EIT as problems scale. Sparse matrix solvers, a key component for the FEM forward problem and sensitivity estimates in the inverse problem, are shown to take a considerable portion of the total compute time in these packages. A sparse matrix solver performance measurement tool, Meagre-Crowd, is developed to interface with a variety of solvers and compare their performance over a range of two- and three-dimensional problems of increasing node density. Results show that distributed sparse matrix solvers that operate on multiple cores are advantageous up to a limit that increases as the node density increases. We recommend a selection procedure to find a solver and hardware arrangement matched to the problem and provide guidance and tools to perform that selection.
MGMRES: A generalization of GMRES for solving large sparse nonsymmetric linear systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, D.M.; Chen, J.Y.
1994-12-31
The authors are concerned with the solution of the linear system (1): Au = b, where A is a real square nonsingular matrix which is large, sparse and non-symmetric. They consider the use of Krylov subspace methods. They first choose an initial approximation u{sup (0)} to the solution {bar u} = A{sup {minus}1}B of (1). They also choose an auxiliary matrix Z which is nonsingular. For n = 1,2,{hor_ellipsis} they determine u{sup (n)} such that u{sup (n)} {minus} u{sup (0)}{epsilon}K{sub n}(r{sup (0)},A) where K{sub n}(r{sup (0)},A) is the (Krylov) subspace spanned by the Krylov vectors r{sup (0)}, Ar{sup (0)}, {hor_ellipsis},more » A{sup n{minus}1}r{sup 0} and where r{sup (0)} = b{minus}Au{sup (0)}. If ZA is SPD they also require that (u{sup (n)}{minus}{bar u}, ZA(u{sup (n)}{minus}{bar u})) be minimized. If, on the other hand, ZA is not SPD, then they require that the Galerkin condition, (Zr{sup n}, v) = 0, be satisfied for all v{epsilon}K{sub n}(r{sup (0)}, A) where r{sup n} = b{minus}Au{sup (n)}. In this paper the authors consider a generalization of GMRES. This generalized method, which they refer to as `MGMRES`, is very similar to GMRES except that they let Z = A{sup T}Y where Y is a nonsingular matrix which is symmetric by not necessarily SPD.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schiffmann, Florian; VandeVondele, Joost, E-mail: Joost.VandeVondele@mat.ethz.ch
2015-06-28
We present an improved preconditioning scheme for electronic structure calculations based on the orbital transformation method. First, a preconditioner is developed which includes information from the full Kohn-Sham matrix but avoids computationally demanding diagonalisation steps in its construction. This reduces the computational cost of its construction, eliminating a bottleneck in large scale simulations, while maintaining rapid convergence. In addition, a modified form of Hotelling’s iterative inversion is introduced to replace the exact inversion of the preconditioner matrix. This method is highly effective during molecular dynamics (MD), as the solution obtained in earlier MD steps is a suitable initial guess. Filteringmore » small elements during sparse matrix multiplication leads to linear scaling inversion, while retaining robustness, already for relatively small systems. For system sizes ranging from a few hundred to a few thousand atoms, which are typical for many practical applications, the improvements to the algorithm lead to a 2-5 fold speedup per MD step.« less
Collaborative sparse priors for multi-view ATR
NASA Astrophysics Data System (ADS)
Li, Xuelu; Monga, Vishal
2018-04-01
Recent work has seen a surge of sparse representation based classification (SRC) methods applied to automatic target recognition problems. While traditional SRC approaches used l0 or l1 norm to quantify sparsity, spike and slab priors have established themselves as the gold standard for providing general tunable sparse structures on vectors. In this work, we employ collaborative spike and slab priors that can be applied to matrices to encourage sparsity for the problem of multi-view ATR. That is, target images captured from multiple views are expanded in terms of a training dictionary multiplied with a coefficient matrix. Ideally, for a test image set comprising of multiple views of a target, coefficients corresponding to its identifying class are expected to be active, while others should be zero, i.e. the coefficient matrix is naturally sparse. We develop a new approach to solve the optimization problem that estimates the sparse coefficient matrix jointly with the sparsity inducing parameters in the collaborative prior. ATR problems are investigated on the mid-wave infrared (MWIR) database made available by the US Army Night Vision and Electronic Sensors Directorate, which has a rich collection of views. Experimental results show that the proposed joint prior and coefficient estimation method (JPCEM) can: 1.) enable improved accuracy when multiple views vs. a single one are invoked, and 2.) outperform state of the art alternatives particularly when training imagery is limited.
NASA Astrophysics Data System (ADS)
Yang, Yongchao; Nagarajaiah, Satish
2016-06-01
Randomly missing data of structural vibration responses time history often occurs in structural dynamics and health monitoring. For example, structural vibration responses are often corrupted by outliers or erroneous measurements due to sensor malfunction; in wireless sensing platforms, data loss during wireless communication is a common issue. Besides, to alleviate the wireless data sampling or communication burden, certain accounts of data are often discarded during sampling or before transmission. In these and other applications, recovery of the randomly missing structural vibration responses from the available, incomplete data, is essential for system identification and structural health monitoring; it is an ill-posed inverse problem, however. This paper explicitly harnesses the data structure itself-of the structural vibration responses-to address this (inverse) problem. What is relevant is an empirical, but often practically true, observation, that is, typically there are only few modes active in the structural vibration responses; hence a sparse representation (in frequency domain) of the single-channel data vector, or, a low-rank structure (by singular value decomposition) of the multi-channel data matrix. Exploiting such prior knowledge of data structure (intra-channel sparse or inter-channel low-rank), the new theories of ℓ1-minimization sparse recovery and nuclear-norm-minimization low-rank matrix completion enable recovery of the randomly missing or corrupted structural vibration response data. The performance of these two alternatives, in terms of recovery accuracy and computational time under different data missing rates, is investigated on a few structural vibration response data sets-the seismic responses of the super high-rise Canton Tower and the structural health monitoring accelerations of a real large-scale cable-stayed bridge. Encouraging results are obtained and the applicability and limitation of the presented methods are discussed.
NASA Technical Reports Server (NTRS)
Ehrhart, E. J.; Gillette, E. L.; Barcellos-Hoff, M. H.; Chaterjee, A. (Principal Investigator)
1996-01-01
High-LET radiation has unique physical and biological properties compared to sparsely ionizing radiation. Recent studies demonstrate that sparsely ionizing radiation rapidly alters the pattern of extracellular matrix expression in several tissues, but little is known about the effect of heavy-ion radiation. This study investigates densely ionizing radiation-induced changes in extracellular matrix localization in the mammary glands of adult female BALB/c mice after whole-body irradiation with 0.8 Gy 600 MeV iron particles. The basement membrane and interstitial extracellular matrix proteins of the mammary gland stroma were mapped with respect to time postirradiation using immunofluorescence. Collagen III was induced in the adipose stroma within 1 day, continued to increase through day 9 and was resolved by day 14. Immunoreactive tenascin was induced in the epithelium by day 1, was evident at the epithelial-stromal interface by day 5-9 and persisted as a condensed layer beneath the basement membrane through day 14. These findings parallel similar changes induced by gamma irradiation but demonstrate different onset and chronicity. In contrast, the integrity of epithelial basement membrane, which was unaffected by sparsely ionizing radiation, was disrupted by iron-particle irradiation. Laminin immunoreactivity was mildly irregular at 1 h postirradiation and showed discontinuities and thickening from days 1 to 9. Continuity was restored by day 14. Thus high-LET radiation, like sparsely ionizing radiation, induces rapid-remodeling of the stromal extracellular matrix but also appears to alter the integrity of the epithelial basement membrane, which is an important regulator of epithelial cell proliferation and differentiation.
NASA Astrophysics Data System (ADS)
Schrodt, Franziska; Shan, Hanhuai; Fazayeli, Farideh; Karpatne, Anuj; Kattge, Jens; Banerjee, Arindam; Reichstein, Markus; Reich, Peter
2013-04-01
With the advent of remotely sensed data and coordinated efforts to create global databases, the ecological community has progressively become more data-intensive. However, in contrast to other disciplines, statistical ways of handling these large data sets, especially the gaps which are inherent to them, are lacking. Widely used theoretical approaches, for example model averaging based on Akaike's information criterion (AIC), are sensitive to missing values. Yet, the most common way of handling sparse matrices - the deletion of cases with missing data (complete case analysis) - is known to severely reduce statistical power as well as inducing biased parameter estimates. In order to address these issues, we present novel approaches to gap filling in large ecological data sets using matrix factorization techniques. Factorization based matrix completion was developed in a recommender system context and has since been widely used to impute missing data in fields outside the ecological community. Here, we evaluate the effectiveness of probabilistic matrix factorization techniques for imputing missing data in ecological matrices using two imputation techniques. Hierarchical Probabilistic Matrix Factorization (HPMF) effectively incorporates hierarchical phylogenetic information (phylogenetic group, family, genus, species and individual plant) into the trait imputation. Advanced Hierarchical Probabilistic Matrix Factorization (aHPMF) on the other hand includes climate and soil information into the matrix factorization by regressing the environmental variables against residuals of the HPMF. One unique opportunity opened up by aHPMF is out-of-sample prediction, where traits can be predicted for specific species at locations different to those sampled in the past. This has potentially far-reaching consequences for the study of global-scale plant functional trait patterns. We test the accuracy and effectiveness of HPMF and aHPMF in filling sparse matrices, using the TRY database of plant functional traits (http://www.try-db.org). TRY is one of the largest global compilations of plant trait databases (750 traits of 1 million plants), encompassing data on morphological, anatomical, biochemical, phenological and physiological features of plants. However, despite of unprecedented coverage, the TRY database is still very sparse, severely limiting joint trait analyses. Plant traits are the key to understanding how plants as primary producers adjust to changes in environmental conditions and in turn influence them. Forming the basis for Dynamic Global Vegetation Models (DGVMs), plant traits are also fundamental in global change studies for predicting future ecosystem changes. It is thus imperative that missing data is imputed in as accurate and precise a way as possible. In this study, we show the advantages and disadvantages of applying probabilistic matrix factorization techniques in incorporating hierarchical and environmental information for the prediction of missing plant traits as compared to conventional imputation techniques such as the complete case and mean approaches. We will discuss the implications of using gap-filled data for global-scale studies of plant functional trait - environment relationship as opposed to the above-mentioned conventional techniques, using examples of out-of-sample predictions of foliar Nitrogen across several species' ranges and biomes.
The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R.
Pang, Haotian; Liu, Han; Vanderbei, Robert
2014-02-01
We develop an R package fastclime for solving a family of regularized linear programming (LP) problems. Our package efficiently implements the parametric simplex algorithm, which provides a scalable and sophisticated tool for solving large-scale linear programs. As an illustrative example, one use of our LP solver is to implement an important sparse precision matrix estimation method called CLIME (Constrained L 1 Minimization Estimator). Compared with existing packages for this problem such as clime and flare, our package has three advantages: (1) it efficiently calculates the full piecewise-linear regularization path; (2) it provides an accurate dual certificate as stopping criterion; (3) it is completely coded in C and is highly portable. This package is designed to be useful to statisticians and machine learning researchers for solving a wide range of problems.
Discriminative Transfer Subspace Learning via Low-Rank and Sparse Representation.
Xu, Yong; Fang, Xiaozhao; Wu, Jian; Li, Xuelong; Zhang, David
2016-02-01
In this paper, we address the problem of unsupervised domain transfer learning in which no labels are available in the target domain. We use a transformation matrix to transfer both the source and target data to a common subspace, where each target sample can be represented by a combination of source samples such that the samples from different domains can be well interlaced. In this way, the discrepancy of the source and target domains is reduced. By imposing joint low-rank and sparse constraints on the reconstruction coefficient matrix, the global and local structures of data can be preserved. To enlarge the margins between different classes as much as possible and provide more freedom to diminish the discrepancy, a flexible linear classifier (projection) is obtained by learning a non-negative label relaxation matrix that allows the strict binary label matrix to relax into a slack variable matrix. Our method can avoid a potentially negative transfer by using a sparse matrix to model the noise and, thus, is more robust to different types of noise. We formulate our problem as a constrained low-rankness and sparsity minimization problem and solve it by the inexact augmented Lagrange multiplier method. Extensive experiments on various visual domain adaptation tasks show the superiority of the proposed method over the state-of-the art methods. The MATLAB code of our method will be publicly available at http://www.yongxu.org/lunwen.html.
Improved analysis of SP and CoSaMP under total perturbations
NASA Astrophysics Data System (ADS)
Li, Haifeng
2016-12-01
Practically, in the underdetermined model y= A x, where x is a K sparse vector (i.e., it has no more than K nonzero entries), both y and A could be totally perturbed. A more relaxed condition means less number of measurements are needed to ensure the sparse recovery from theoretical aspect. In this paper, based on restricted isometry property (RIP), for subspace pursuit (SP) and compressed sampling matching pursuit (CoSaMP), two relaxed sufficient conditions are presented under total perturbations to guarantee that the sparse vector x is recovered. Taking random matrix as measurement matrix, we also discuss the advantage of our condition. Numerical experiments validate that SP and CoSaMP can provide oracle-order recovery performance.
Multi-energy CT based on a prior rank, intensity and sparsity model (PRISM).
Gao, Hao; Yu, Hengyong; Osher, Stanley; Wang, Ge
2011-11-01
We propose a compressive sensing approach for multi-energy computed tomography (CT), namely the prior rank, intensity and sparsity model (PRISM). To further compress the multi-energy image for allowing the reconstruction with fewer CT data and less radiation dose, the PRISM models a multi-energy image as the superposition of a low-rank matrix and a sparse matrix (with row dimension in space and column dimension in energy), where the low-rank matrix corresponds to the stationary background over energy that has a low matrix rank, and the sparse matrix represents the rest of distinct spectral features that are often sparse. Distinct from previous methods, the PRISM utilizes the generalized rank, e.g., the matrix rank of tight-frame transform of a multi-energy image, which offers a way to characterize the multi-level and multi-filtered image coherence across the energy spectrum. Besides, the energy-dependent intensity information can be incorporated into the PRISM in terms of the spectral curves for base materials, with which the restoration of the multi-energy image becomes the reconstruction of the energy-independent material composition matrix. In other words, the PRISM utilizes prior knowledge on the generalized rank and sparsity of a multi-energy image, and intensity/spectral characteristics of base materials. Furthermore, we develop an accurate and fast split Bregman method for the PRISM and demonstrate the superior performance of the PRISM relative to several competing methods in simulations.
A comparison of SuperLU solvers on the intel MIC architecture
NASA Astrophysics Data System (ADS)
Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.
2016-10-01
In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
Fast Solution in Sparse LDA for Binary Classification
NASA Technical Reports Server (NTRS)
Moghaddam, Baback
2010-01-01
An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.
NASA Astrophysics Data System (ADS)
Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo
2016-07-01
Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Target detection in GPR data using joint low-rank and sparsity constraints
NASA Astrophysics Data System (ADS)
Bouzerdoum, Abdesselam; Tivive, Fok Hing Chi; Abeynayake, Canicious
2016-05-01
In ground penetrating radars, background clutter, which comprises the signals backscattered from the rough, uneven ground surface and the background noise, impairs the visualization of buried objects and subsurface inspections. In this paper, a clutter mitigation method is proposed for target detection. The removal of background clutter is formulated as a constrained optimization problem to obtain a low-rank matrix and a sparse matrix. The low-rank matrix captures the ground surface reflections and the background noise, whereas the sparse matrix contains the target reflections. An optimization method based on split-Bregman algorithm is developed to estimate these two matrices from the input GPR data. Evaluated on real radar data, the proposed method achieves promising results in removing the background clutter and enhancing the target signature.
An approach to solving large reliability models
NASA Technical Reports Server (NTRS)
Boyd, Mark A.; Veeraraghavan, Malathi; Dugan, Joanne Bechta; Trivedi, Kishor S.
1988-01-01
This paper describes a unified approach to the problem of solving large realistic reliability models. The methodology integrates behavioral decomposition, state trunction, and efficient sparse matrix-based numerical methods. The use of fault trees, together with ancillary information regarding dependencies to automatically generate the underlying Markov model state space is proposed. The effectiveness of this approach is illustrated by modeling a state-of-the-art flight control system and a multiprocessor system. Nonexponential distributions for times to failure of components are assumed in the latter example. The modeling tool used for most of this analysis is HARP (the Hybrid Automated Reliability Predictor).
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-09
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly. PMID:25572661
NASA Astrophysics Data System (ADS)
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Weighted low-rank sparse model via nuclear norm minimization for bearing fault detection
NASA Astrophysics Data System (ADS)
Du, Zhaohui; Chen, Xuefeng; Zhang, Han; Yang, Boyuan; Zhai, Zhi; Yan, Ruqiang
2017-07-01
It is a fundamental task in the machine fault diagnosis community to detect impulsive signatures generated by the localized faults of bearings. The main goal of this paper is to exploit the low-rank physical structure of periodic impulsive features and further establish a weighted low-rank sparse model for bearing fault detection. The proposed model mainly consists of three basic components: an adaptive partition window, a nuclear norm regularization and a weighted sequence. Firstly, due to the periodic repetition mechanism of impulsive feature, an adaptive partition window could be designed to transform the impulsive feature into a data matrix. The highlight of partition window is to accumulate all local feature information and align them. Then, all columns of the data matrix share similar waveforms and a core physical phenomenon arises, i.e., these singular values of the data matrix demonstrates a sparse distribution pattern. Therefore, a nuclear norm regularization is enforced to capture that sparse prior. However, the nuclear norm regularization treats all singular values equally and thus ignores one basic fact that larger singular values have more information volume of impulsive features and should be preserved as much as possible. Therefore, a weighted sequence with adaptively tuning weights inversely proportional to singular amplitude is adopted to guarantee the distribution consistence of large singular values. On the other hand, the proposed model is difficult to solve due to its non-convexity and thus a new algorithm is developed to search one satisfying stationary solution through alternatively implementing one proximal operator operation and least-square fitting. Moreover, the sensitivity analysis and selection principles of algorithmic parameters are comprehensively investigated through a set of numerical experiments, which shows that the proposed method is robust and only has a few adjustable parameters. Lastly, the proposed model is applied to the wind turbine (WT) bearing fault detection and its effectiveness is sufficiently verified. Compared with the current popular bearing fault diagnosis techniques, wavelet analysis and spectral kurtosis, our model achieves a higher diagnostic accuracy.
Sparse image reconstruction for molecular imaging.
Ting, Michael; Raich, Raviv; Hero, Alfred O
2009-06-01
The application that motivates this paper is molecular imaging at the atomic level. When discretized at subatomic distances, the volume is inherently sparse. Noiseless measurements from an imaging technology can be modeled by convolution of the image with the system point spread function (psf). Such is the case with magnetic resonance force microscopy (MRFM), an emerging technology where imaging of an individual tobacco mosaic virus was recently demonstrated with nanometer resolution. We also consider additive white Gaussian noise (AWGN) in the measurements. Many prior works of sparse estimators have focused on the case when H has low coherence; however, the system matrix H in our application is the convolution matrix for the system psf. A typical convolution matrix has high coherence. This paper, therefore, does not assume a low coherence H. A discrete-continuous form of the Laplacian and atom at zero (LAZE) p.d.f. used by Johnstone and Silverman is formulated, and two sparse estimators derived by maximizing the joint p.d.f. of the observation and image conditioned on the hyperparameters. A thresholding rule that generalizes the hard and soft thresholding rule appears in the course of the derivation. This so-called hybrid thresholding rule, when used in the iterative thresholding framework, gives rise to the hybrid estimator, a generalization of the lasso. Estimates of the hyperparameters for the lasso and hybrid estimator are obtained via Stein's unbiased risk estimate (SURE). A numerical study with a Gaussian psf and two sparse images shows that the hybrid estimator outperforms the lasso.
Total variation-based method for radar coincidence imaging with model mismatch for extended target
NASA Astrophysics Data System (ADS)
Cao, Kaicheng; Zhou, Xiaoli; Cheng, Yongqiang; Fan, Bo; Qin, Yuliang
2017-11-01
Originating from traditional optical coincidence imaging, radar coincidence imaging (RCI) is a staring/forward-looking imaging technique. In RCI, the reference matrix must be computed precisely to reconstruct the image as preferred; unfortunately, such precision is almost impossible due to the existence of model mismatch in practical applications. Although some conventional sparse recovery algorithms are proposed to solve the model-mismatch problem, they are inapplicable to nonsparse targets. We therefore sought to derive the signal model of RCI with model mismatch by replacing the sparsity constraint item with total variation (TV) regularization in the sparse total least squares optimization problem; in this manner, we obtain the objective function of RCI with model mismatch for an extended target. A more robust and efficient algorithm called TV-TLS is proposed, in which the objective function is divided into two parts and the perturbation matrix and scattering coefficients are updated alternately. Moreover, due to the ability of TV regularization to recover sparse signal or image with sparse gradient, TV-TLS method is also applicable to sparse recovering. Results of numerical experiments demonstrate that, for uniform extended targets, sparse targets, and real extended targets, the algorithm can achieve preferred imaging performance both in suppressing noise and in adapting to model mismatch.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.
1995-10-01
Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
Randomized subspace-based robust principal component analysis for hyperspectral anomaly detection
NASA Astrophysics Data System (ADS)
Sun, Weiwei; Yang, Gang; Li, Jialin; Zhang, Dianfa
2018-01-01
A randomized subspace-based robust principal component analysis (RSRPCA) method for anomaly detection in hyperspectral imagery (HSI) is proposed. The RSRPCA combines advantages of randomized column subspace and robust principal component analysis (RPCA). It assumes that the background has low-rank properties, and the anomalies are sparse and do not lie in the column subspace of the background. First, RSRPCA implements random sampling to sketch the original HSI dataset from columns and to construct a randomized column subspace of the background. Structured random projections are also adopted to sketch the HSI dataset from rows. Sketching from columns and rows could greatly reduce the computational requirements of RSRPCA. Second, the RSRPCA adopts the columnwise RPCA (CWRPCA) to eliminate negative effects of sampled anomaly pixels and that purifies the previous randomized column subspace by removing sampled anomaly columns. The CWRPCA decomposes the submatrix of the HSI data into a low-rank matrix (i.e., background component), a noisy matrix (i.e., noise component), and a sparse anomaly matrix (i.e., anomaly component) with only a small proportion of nonzero columns. The algorithm of inexact augmented Lagrange multiplier is utilized to optimize the CWRPCA problem and estimate the sparse matrix. Nonzero columns of the sparse anomaly matrix point to sampled anomaly columns in the submatrix. Third, all the pixels are projected onto the complemental subspace of the purified randomized column subspace of the background and the anomaly pixels in the original HSI data are finally exactly located. Several experiments on three real hyperspectral images are carefully designed to investigate the detection performance of RSRPCA, and the results are compared with four state-of-the-art methods. Experimental results show that the proposed RSRPCA outperforms four comparison methods both in detection performance and in computational time.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
Li, Ruipeng; Saad, Yousef
2017-08-01
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ruipeng; Saad, Yousef
This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Xu, Jason; Minin, Vladimir N
2015-07-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes.
Xu, Jason; Minin, Vladimir N.
2016-01-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes. PMID:26949377
Solving large-scale dynamic systems using band Lanczos method in Rockwell NASTRAN on CRAY X-MP
NASA Technical Reports Server (NTRS)
Gupta, V. K.; Zillmer, S. D.; Allison, R. E.
1986-01-01
The improved cost effectiveness using better models, more accurate and faster algorithms and large scale computing offers more representative dynamic analyses. The band Lanczos eigen-solution method was implemented in Rockwell's version of 1984 COSMIC-released NASTRAN finite element structural analysis computer program to effectively solve for structural vibration modes including those of large complex systems exceeding 10,000 degrees of freedom. The Lanczos vectors were re-orthogonalized locally using the Lanczos Method and globally using the modified Gram-Schmidt method for sweeping rigid-body modes and previously generated modes and Lanczos vectors. The truncated band matrix was solved for vibration frequencies and mode shapes using Givens rotations. Numerical examples are included to demonstrate the cost effectiveness and accuracy of the method as implemented in ROCKWELL NASTRAN. The CRAY version is based on RPK's COSMIC/NASTRAN. The band Lanczos method was more reliable and accurate and converged faster than the single vector Lanczos Method. The band Lanczos method was comparable to the subspace iteration method which was a block version of the inverse power method. However, the subspace matrix tended to be fully populated in the case of subspace iteration and not as sparse as a band matrix.
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS.
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2011-01-01
The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.
Effect of missing data on multitask prediction methods.
de la Vega de León, Antonio; Chen, Beining; Gillet, Valerie J
2018-05-22
There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises.
Modal analysis of circular Bragg fibers with arbitrary index profiles
NASA Astrophysics Data System (ADS)
Horikis, Theodoros P.; Kath, William L.
2006-12-01
A finite-difference approach based upon the immersed interface method is used to analyze the mode structure of Bragg fibers with arbitrary index profiles. The method allows general propagation constants and eigenmodes to be calculated to a high degree of accuracy, while computation times are kept to a minimum by exploiting sparse matrix algebra. The method is well suited to handle complicated structures comprised of a large number of thin layers with high-index contrast and simultaneously determines multiple eigenmodes without modification.
Sparse electrocardiogram signals recovery based on solving a row echelon-like form of system.
Cai, Pingmei; Wang, Guinan; Yu, Shiwei; Zhang, Hongjuan; Ding, Shuxue; Wu, Zikai
2016-02-01
The study of biology and medicine in a noise environment is an evolving direction in biological data analysis. Among these studies, analysis of electrocardiogram (ECG) signals in a noise environment is a challenging direction in personalized medicine. Due to its periodic characteristic, ECG signal can be roughly regarded as sparse biomedical signals. This study proposes a two-stage recovery algorithm for sparse biomedical signals in time domain. In the first stage, the concentration subspaces are found in advance. Then by exploiting these subspaces, the mixing matrix is estimated accurately. In the second stage, based on the number of active sources at each time point, the time points are divided into different layers. Next, by constructing some transformation matrices, these time points form a row echelon-like system. After that, the sources at each layer can be solved out explicitly by corresponding matrix operations. It is noting that all these operations are conducted under a weak sparse condition that the number of active sources is less than the number of observations. Experimental results show that the proposed method has a better performance for sparse ECG signal recovery problem.
Feasibility of Very Large Sparse Aperture Deployable Antennas
2014-03-27
FEASIBILITY OF VERY LARGE SPARSE APERTURE DEPLOYABLE ANTENNAS THESIS Jason C. Heller, Captain...States. AFIT-ENY-14-M-24 FEASIBILITY OF VERY LARGE SPARSE APERTURE DEPLOYABLE ANTENNAS THESIS Presented to the Faculty...UNLIMITED AFIT-ENY-14-M-24 FEASIBILITY OF VERY LARGE SPARSE APERTURE DEPLOYABLE ANTENNAS Jason C. Heller, B.S., Aerospace
NASA Astrophysics Data System (ADS)
Xue, Zhaohui; Du, Peijun; Li, Jun; Su, Hongjun
2017-02-01
The generally limited availability of training data relative to the usually high data dimension pose a great challenge to accurate classification of hyperspectral imagery, especially for identifying crops characterized with highly correlated spectra. However, traditional parametric classification models are problematic due to the need of non-singular class-specific covariance matrices. In this research, a novel sparse graph regularization (SGR) method is presented, aiming at robust crop mapping using hyperspectral imagery with very few in situ data. The core of SGR lies in propagating labels from known data to unknown, which is triggered by: (1) the fraction matrix generated for the large unknown data by using an effective sparse representation algorithm with respect to the few training data serving as the dictionary; (2) the prediction function estimated for the few training data by formulating a regularization model based on sparse graph. Then, the labels of large unknown data can be obtained by maximizing the posterior probability distribution based on the two ingredients. SGR is more discriminative, data-adaptive, robust to noise, and efficient, which is unique with regard to previously proposed approaches and has high potentials in discriminating crops, especially when facing insufficient training data and high-dimensional spectral space. The study area is located at Zhangye basin in the middle reaches of Heihe watershed, Gansu, China, where eight crop types were mapped with Compact Airborne Spectrographic Imager (CASI) and Shortwave Infrared Airborne Spectrogrpahic Imager (SASI) hyperspectral data. Experimental results demonstrate that the proposed method significantly outperforms other traditional and state-of-the-art methods.
Factorization in large-scale many-body calculations
Johnson, Calvin W.; Ormand, W. Erich; Krastev, Plamen G.
2013-08-07
One approach for solving interacting many-fermion systems is the configuration-interaction method, also sometimes called the interacting shell model, where one finds eigenvalues of the Hamiltonian in a many-body basis of Slater determinants (antisymmetrized products of single-particle wavefunctions). The resulting Hamiltonian matrix is typically very sparse, but for large systems the nonzero matrix elements can nonetheless require terabytes or more of storage. An alternate algorithm, applicable to a broad class of systems with symmetry, in our case rotational invariance, is to exactly factorize both the basis and the interaction using additive/multiplicative quantum numbers; such an algorithm recreates the many-body matrix elementsmore » on the fly and can reduce the storage requirements by an order of magnitude or more. Here, we discuss factorization in general and introduce a novel, generalized factorization method, essentially a ‘double-factorization’ which speeds up basis generation and set-up of required arrays. Although we emphasize techniques, we also place factorization in the context of a specific (unpublished) configuration-interaction code, BIGSTICK, which runs both on serial and parallel machines, and discuss the savings in memory due to factorization.« less
A physiologically motivated sparse, compact, and smooth (SCS) approach to EEG source localization.
Cao, Cheng; Akalin Acar, Zeynep; Kreutz-Delgado, Kenneth; Makeig, Scott
2012-01-01
Here, we introduce a novel approach to the EEG inverse problem based on the assumption that principal cortical sources of multi-channel EEG recordings may be assumed to be spatially sparse, compact, and smooth (SCS). To enforce these characteristics of solutions to the EEG inverse problem, we propose a correlation-variance model which factors a cortical source space covariance matrix into the multiplication of a pre-given correlation coefficient matrix and the square root of the diagonal variance matrix learned from the data under a Bayesian learning framework. We tested the SCS method using simulated EEG data with various SNR and applied it to a real ECOG data set. We compare the results of SCS to those of an established SBL algorithm.
Non-convex Statistical Optimization for Sparse Tensor Graphical Model
Sun, Wei; Wang, Zhaoran; Liu, Han; Cheng, Guang
2016-01-01
We consider the estimation of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. The penalized maximum likelihood estimation of this model involves minimizing a non-convex objective function. In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence as well as consistent graph recovery. Notably, such an estimator achieves estimation consistency with only one tensor sample, which is unobserved in previous work. Our theoretical results are backed by thorough numerical studies. PMID:28316459
NASA Astrophysics Data System (ADS)
Hu, Guiqiang; Xiao, Di; Wang, Yong; Xiang, Tao; Zhou, Qing
2017-11-01
Recently, a new kind of image encryption approach using compressive sensing (CS) and double random phase encoding has received much attention due to the advantages such as compressibility and robustness. However, this approach is found to be vulnerable to chosen plaintext attack (CPA) if the CS measurement matrix is re-used. Therefore, designing an efficient measurement matrix updating mechanism that ensures resistance to CPA is of practical significance. In this paper, we provide a novel solution to update the CS measurement matrix by altering the secret sparse basis with the help of counter mode operation. Particularly, the secret sparse basis is implemented by a reality-preserving fractional cosine transform matrix. Compared with the conventional CS-based cryptosystem that totally generates all the random entries of measurement matrix, our scheme owns efficiency superiority while guaranteeing resistance to CPA. Experimental and analysis results show that the proposed scheme has a good security performance and has robustness against noise and occlusion.
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Azad, Ariful; Ballard, Grey; Buluc, Aydin; ...
2016-11-08
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achievingmore » significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.« less
HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS
Fan, Jianqing; Liao, Yuan; Mincheva, Martina
2012-01-01
The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. PMID:22661790
Xi, Jianing; Wang, Minghui; Li, Ao
2018-06-05
Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
Dynamic Textures Modeling via Joint Video Dictionary Learning.
Wei, Xian; Li, Yuanxiang; Shen, Hao; Chen, Fang; Kleinsteuber, Martin; Wang, Zhongfeng
2017-04-06
Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DT) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying "states", we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
Visual recognition and inference using dynamic overcomplete sparse learning.
Murray, Joseph F; Kreutz-Delgado, Kenneth
2007-09-01
We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.
A Sparse Matrix Approach for Simultaneous Quantification of Nystagmus and Saccade
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.; Stone, Lee; Boyle, Richard D.
2012-01-01
The vestibulo-ocular reflex (VOR) consists of two intermingled non-linear subsystems; namely, nystagmus and saccade. Typically, nystagmus is analysed using a single sufficiently long signal or a concatenation of them. Saccade information is not analysed and discarded due to insufficient data length to provide consistent and minimum variance estimates. This paper presents a novel sparse matrix approach to system identification of the VOR. It allows for the simultaneous estimation of both nystagmus and saccade signals. We show via simulation of the VOR that our technique provides consistent and unbiased estimates in the presence of output additive noise.
An efficient implementation of a high-order filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-03-01
A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
On-Chip Neural Data Compression Based On Compressed Sensing With Sparse Sensing Matrices.
Zhao, Wenfeng; Sun, Biao; Wu, Tong; Yang, Zhi
2018-02-01
On-chip neural data compression is an enabling technique for wireless neural interfaces that suffer from insufficient bandwidth and power budgets to transmit the raw data. The data compression algorithm and its implementation should be power and area efficient and functionally reliable over different datasets. Compressed sensing is an emerging technique that has been applied to compress various neurophysiological data. However, the state-of-the-art compressed sensing (CS) encoders leverage random but dense binary measurement matrices, which incur substantial implementation costs on both power and area that could offset the benefits from the reduced wireless data rate. In this paper, we propose two CS encoder designs based on sparse measurement matrices that could lead to efficient hardware implementation. Specifically, two different approaches for the construction of sparse measurement matrices, i.e., the deterministic quasi-cyclic array code (QCAC) matrix and -sparse random binary matrix [-SRBM] are exploited. We demonstrate that the proposed CS encoders lead to comparable recovery performance. And efficient VLSI architecture designs are proposed for QCAC-CS and -SRBM encoders with reduced area and total power consumption.
Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization.
Lu, Canyi; Lin, Zhouchen; Yan, Shuicheng
2015-02-01
This paper presents a general framework for solving the low-rank and/or sparse matrix minimization problems, which may involve multiple nonsmooth terms. The iteratively reweighted least squares (IRLSs) method is a fast solver, which smooths the objective function and minimizes it by alternately updating the variables and their weights. However, the traditional IRLS can only solve a sparse only or low rank only minimization problem with squared loss or an affine constraint. This paper generalizes IRLS to solve joint/mixed low-rank and sparse minimization problems, which are essential formulations for many tasks. As a concrete example, we solve the Schatten-p norm and l2,q-norm regularized low-rank representation problem by IRLS, and theoretically prove that the derived solution is a stationary point (globally optimal if p,q ≥ 1). Our convergence proof of IRLS is more general than previous one that depends on the special properties of the Schatten-p norm and l2,q-norm. Extensive experiments on both synthetic and real data sets demonstrate that our IRLS is much more efficient.
Task-driven dictionary learning.
Mairal, Julien; Bach, Francis; Ponce, Jean
2012-04-01
Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.
Comparing implementations of penalized weighted least-squares sinogram restoration.
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-11-01
A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix inversion into smaller coupled problems and exploited sparseness to minimize matrix operations. For the conjugate-gradient approach, the authors exploited sparseness and preconditioned the problem to speed up convergence. All methods produced qualitatively and quantitatively similar images as measured by resolution-variance tradeoffs and difference images. Despite the acceleration strategies, the direct matrix-inversion approach was found to be uncompetitive with iterative approaches, with a computational burden higher by an order of magnitude or more. The iterative conjugate-gradient approach, however, does appear promising, with computation times half that of the authors' previous penalized-likelihood implementation. Iterative conjugate-gradient based PWLS sinogram restoration with careful matrix optimizations has computational advantages over direct matrix PWLS inversion and over penalized-likelihood sinogram restoration and can be considered a good alternative in standard-dose regimes.
Shen, Hong-Bin
2011-01-01
Modern science of networks has brought significant advances to our understanding of complex systems biology. As a representative model of systems biology, Protein Interaction Networks (PINs) are characterized by a remarkable modular structures, reflecting functional associations between their components. Many methods were proposed to capture cohesive modules so that there is a higher density of edges within modules than those across them. Recent studies reveal that cohesively interacting modules of proteins is not a universal organizing principle in PINs, which has opened up new avenues for revisiting functional modules in PINs. In this paper, functional clusters in PINs are found to be able to form unorthodox structures defined as bi-sparse module. In contrast to the traditional cohesive module, the nodes in the bi-sparse module are sparsely connected internally and densely connected with other bi-sparse or cohesive modules. We present a novel protocol called the BinTree Seeking (BTS) for mining both bi-sparse and cohesive modules in PINs based on Edge Density of Module (EDM) and matrix theory. BTS detects modules by depicting links and nodes rather than nodes alone and its derivation procedure is totally performed on adjacency matrix of networks. The number of modules in a PIN can be automatically determined in the proposed BTS approach. BTS is tested on three real PINs and the results demonstrate that functional modules in PINs are not dominantly cohesive but can be sparse. BTS software and the supporting information are available at: www.csbio.sjtu.edu.cn/bioinf/BTS/. PMID:22140454
A Spectral Algorithm for Envelope Reduction of Sparse Matrices
NASA Technical Reports Server (NTRS)
Barnard, Stephen T.; Pothen, Alex; Simon, Horst D.
1993-01-01
The problem of reordering a sparse symmetric matrix to reduce its envelope size is considered. A new spectral algorithm for computing an envelope-reducing reordering is obtained by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. This Laplacian eigenvector solves a continuous relaxation of a discrete problem related to envelope minimization called the minimum 2-sum problem. The permutation vector computed by the spectral algorithm is a closest permutation vector to the specified Laplacian eigenvector. Numerical results show that the new reordering algorithm usually computes smaller envelope sizes than those obtained from the current standard algorithms such as Gibbs-Poole-Stockmeyer (GPS) or SPARSPAK reverse Cuthill-McKee (RCM), in some cases reducing the envelope by more than a factor of two.
Removing flicker based on sparse color correspondences in old film restoration
NASA Astrophysics Data System (ADS)
Huang, Xi; Ding, Youdong; Yu, Bing; Xia, Tianran
2018-04-01
In the long history of human civilization, archived film is an indispensable part of it, and using digital method to repair damaged film is also a mainstream trend nowadays. In this paper, we propose a sparse color correspondences based technique to remove fading flicker for old films. Our model, combined with multi frame images to establish a simple correction model, includes three key steps. Firstly, we recover sparse color correspondences in the input frames to build a matrix with many missing entries. Secondly, we present a low-rank matrix factorization approach to estimate the unknown parameters of this model. Finally, we adopt a two-step strategy that divide the estimated parameters into reference frame parameters for color recovery correction and other frame parameters for color consistency correction to remove flicker. Our method combined multi-frames takes continuity of the input sequence into account, and the experimental results show the method can remove fading flicker efficiently.
Sparse Covariance Matrix Estimation by DCA-Based Algorithms.
Phan, Duy Nhat; Le Thi, Hoai An; Dinh, Tao Pham
2017-11-01
This letter proposes a novel approach using the [Formula: see text]-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the [Formula: see text] term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of [Formula: see text]-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods.
NASA Astrophysics Data System (ADS)
Qin, Xulei; Cong, Zhibin; Fei, Baowei
2013-11-01
An automatic segmentation framework is proposed to segment the right ventricle (RV) in echocardiographic images. The method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform, a training model, and a localized region-based level set. First, the sparse matrix transform extracts main motion regions of the myocardium as eigen-images by analyzing the statistical information of the images. Second, an RV training model is registered to the eigen-images in order to locate the position of the RV. Third, the training model is adjusted and then serves as an optimized initialization for the segmentation of each image. Finally, based on the initializations, a localized, region-based level set algorithm is applied to segment both epicardial and endocardial boundaries in each echocardiograph. Three evaluation methods were used to validate the performance of the segmentation framework. The Dice coefficient measures the overall agreement between the manual and automatic segmentation. The absolute distance and the Hausdorff distance between the boundaries from manual and automatic segmentation were used to measure the accuracy of the segmentation. Ultrasound images of human subjects were used for validation. For the epicardial and endocardial boundaries, the Dice coefficients were 90.8 ± 1.7% and 87.3 ± 1.9%, the absolute distances were 2.0 ± 0.42 mm and 1.79 ± 0.45 mm, and the Hausdorff distances were 6.86 ± 1.71 mm and 7.02 ± 1.17 mm, respectively. The automatic segmentation method based on a sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.
Masuda, Y; Misztal, I; Legarra, A; Tsuruta, S; Lourenco, D A L; Fragomeni, B O; Aguilar, I
2017-01-01
This paper evaluates an efficient implementation to multiply the inverse of a numerator relationship matrix for genotyped animals () by a vector (). The computation is required for solving mixed model equations in single-step genomic BLUP (ssGBLUP) with the preconditioned conjugate gradient (PCG). The inverse can be decomposed into sparse matrices that are blocks of the sparse inverse of a numerator relationship matrix () including genotyped animals and their ancestors. The elements of were rapidly calculated with the Henderson's rule and stored as sparse matrices in memory. Implementation of was by a series of sparse matrix-vector multiplications. Diagonal elements of , which were required as preconditioners in PCG, were approximated with a Monte Carlo method using 1,000 samples. The efficient implementation of was compared with explicit inversion of with 3 data sets including about 15,000, 81,000, and 570,000 genotyped animals selected from populations with 213,000, 8.2 million, and 10.7 million pedigree animals, respectively. The explicit inversion required 1.8 GB, 49 GB, and 2,415 GB (estimated) of memory, respectively, and 42 s, 56 min, and 13.5 d (estimated), respectively, for the computations. The efficient implementation required <1 MB, 2.9 GB, and 2.3 GB of memory, respectively, and <1 sec, 3 min, and 5 min, respectively, for setting up. Only <1 sec was required for the multiplication in each PCG iteration for any data sets. When the equations in ssGBLUP are solved with the PCG algorithm, is no longer a limiting factor in the computations.
Compressive sensing using optimized sensing matrix for face verification
NASA Astrophysics Data System (ADS)
Oey, Endra; Jeffry; Wongso, Kelvin; Tommy
2017-12-01
Biometric appears as one of the solutions which is capable in solving problems that occurred in the usage of password in terms of data access, for example there is possibility in forgetting password and hard to recall various different passwords. With biometrics, physical characteristics of a person can be captured and used in the identification process. In this research, facial biometric is used in the verification process to determine whether the user has the authority to access the data or not. Facial biometric is chosen as its low cost implementation and generate quite accurate result for user identification. Face verification system which is adopted in this research is Compressive Sensing (CS) technique, in which aims to reduce dimension size as well as encrypt data in form of facial test image where the image is represented in sparse signals. Encrypted data can be reconstructed using Sparse Coding algorithm. Two types of Sparse Coding namely Orthogonal Matching Pursuit (OMP) and Iteratively Reweighted Least Squares -ℓp (IRLS-ℓp) will be used for comparison face verification system research. Reconstruction results of sparse signals are then used to find Euclidean norm with the sparse signal of user that has been previously saved in system to determine the validity of the facial test image. Results of system accuracy obtained in this research are 99% in IRLS with time response of face verification for 4.917 seconds and 96.33% in OMP with time response of face verification for 0.4046 seconds with non-optimized sensing matrix, while 99% in IRLS with time response of face verification for 13.4791 seconds and 98.33% for OMP with time response of face verification for 3.1571 seconds with optimized sensing matrix.
Efficient Storage Scheme of Covariance Matrix during Inverse Modeling
NASA Astrophysics Data System (ADS)
Mao, D.; Yeh, T. J.
2013-12-01
During stochastic inverse modeling, the covariance matrix of geostatistical based methods carries the information about the geologic structure. Its update during iterations reflects the decrease of uncertainty with the incorporation of observed data. For large scale problem, its storage and update cost too much memory and computational resources. In this study, we propose a new efficient storage scheme for storage and update. Compressed Sparse Column (CSC) format is utilized to storage the covariance matrix, and users can assign how many data they prefer to store based on correlation scales since the data beyond several correlation scales are usually not very informative for inverse modeling. After every iteration, only the diagonal terms of the covariance matrix are updated. The off diagonal terms are calculated and updated based on shortened correlation scales with a pre-assigned exponential model. The correlation scales are shortened by a coefficient, i.e. 0.95, every iteration to show the decrease of uncertainty. There is no universal coefficient for all the problems and users are encouraged to try several times. This new scheme is tested with 1D examples first. The estimated results and uncertainty are compared with the traditional full storage method. In the end, a large scale numerical model is utilized to validate this new scheme.
Computational efficiency improvements for image colorization
NASA Astrophysics Data System (ADS)
Yu, Chao; Sharma, Gaurav; Aly, Hussein
2013-03-01
We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
Direction of Arrival Estimation for MIMO Radar via Unitary Nuclear Norm Minimization
Wang, Xianpeng; Huang, Mengxing; Wu, Xiaoqin; Bi, Guoan
2017-01-01
In this paper, we consider the direction of arrival (DOA) estimation issue of noncircular (NC) source in multiple-input multiple-output (MIMO) radar and propose a novel unitary nuclear norm minimization (UNNM) algorithm. In the proposed method, the noncircular properties of signals are used to double the virtual array aperture, and the real-valued data are obtained by utilizing unitary transformation. Then a real-valued block sparse model is established based on a novel over-complete dictionary, and a UNNM algorithm is formulated for recovering the block-sparse matrix. In addition, the real-valued NC-MUSIC spectrum is used to design a weight matrix for reweighting the nuclear norm minimization to achieve the enhanced sparsity of solutions. Finally, the DOA is estimated by searching the non-zero blocks of the recovered matrix. Because of using the noncircular properties of signals to extend the virtual array aperture and an additional real structure to suppress the noise, the proposed method provides better performance compared with the conventional sparse recovery based algorithms. Furthermore, the proposed method can handle the case of underdetermined DOA estimation. Simulation results show the effectiveness and advantages of the proposed method. PMID:28441770
Convex Banding of the Covariance Matrix
Bien, Jacob; Bunea, Florentina; Xiao, Luo
2016-01-01
We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings. PMID:28042189
Convex Banding of the Covariance Matrix.
Bien, Jacob; Bunea, Florentina; Xiao, Luo
2016-01-01
We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes. Furthermore, it correctly recovers the bandwidth when the true covariance is exactly banded. Our convex formulation admits a simple and efficient algorithm. Empirical studies demonstrate its practical effectiveness and illustrate that our exactly-banded estimator works well even when the true covariance matrix is only close to a banded matrix, confirming our theoretical results. Our method compares favorably with all existing methods, in terms of accuracy and speed. We illustrate the practical merits of the convex banding estimator by showing that it can be used to improve the performance of discriminant analysis for classifying sound recordings.
Multi-GPU implementation of a VMAT treatment plan optimization algorithm.
Tian, Zhen; Peng, Fei; Folkerts, Michael; Tan, Jun; Jia, Xun; Jiang, Steve B
2015-06-01
Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU's relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors' group, on a multi-GPU platform to solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors' method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H&N) cancer case is then used to validate the authors' method. The authors also compare their multi-GPU implementation with three different single GPU implementation strategies, i.e., truncating DDC matrix (S1), repeatedly transferring DDC matrix between CPU and GPU (S2), and porting computations involving DDC matrix to CPU (S3), in terms of both plan quality and computational efficiency. Two more H&N patient cases and three prostate cases are used to demonstrate the advantages of the authors' method. The authors' multi-GPU implementation can finish the optimization process within ∼ 1 min for the H&N patient case. S1 leads to an inferior plan quality although its total time was 10 s shorter than the multi-GPU implementation due to the reduced matrix size. S2 and S3 yield the same plan quality as the multi-GPU implementation but take ∼4 and ∼6 min, respectively. High computational efficiency was consistently achieved for the other five patient cases tested, with VMAT plans of clinically acceptable quality obtained within 23-46 s. Conversely, to obtain clinically comparable or acceptable plans for all six of these VMAT cases that the authors have tested in this paper, the optimization time needed in a commercial TPS system on CPU was found to be in an order of several minutes. The results demonstrate that the multi-GPU implementation of the authors' column-generation-based VMAT optimization can handle the large-scale VMAT optimization problem efficiently without sacrificing plan quality. The authors' study may serve as an example to shed some light on other large-scale medical physics problems that require multi-GPU techniques.
Sparse Matrix Motivated Reconstruction of Far-Field Radiation Patterns
2015-03-01
method for base - station antenna radiation patterns. IEEE Antennas Propagation Magazine. 2001;43(2):132. 4. Vasiliadis TG, Dimitriou D, Sergiadis JD...algorithm based on sparse representations of radiation patterns using the inverse Discrete Fourier Transform (DFT) and the inverse Discrete Cosine...patterns using a Model- Based Parameter Estimation (MBPE) technique that reduces the computational time required to model radiation patterns. Another
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less
An efficient sparse matrix multiplication scheme for the CYBER 205 computer
NASA Technical Reports Server (NTRS)
Lambiotte, Jules J., Jr.
1988-01-01
This paper describes the development of an efficient algorithm for computing the product of a matrix and vector on a CYBER 205 vector computer. The desire to provide software which allows the user to choose between the often conflicting goals of minimizing central processing unit (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of four types of storage is selected for each diagonal. The candidate storage types employed were chosen to be efficient on the CYBER 205 for diagonals which have nonzero structure which is dense, moderately sparse, very sparse and short, or very sparse and long; however, for many densities, no diagonal type is most efficient with respect to both resource requirements, and a trade-off must be made. For each diagonal, an initialization subroutine estimates the CPU time and storage required for each storage type based on results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the two resources. The adjusted resource requirements are then compared to select the most efficient storage and computational scheme.
Wen, Zaidao; Hou, Zaidao; Jiao, Licheng
2017-11-01
Discriminative dictionary learning (DDL) framework has been widely used in image classification which aims to learn some class-specific feature vectors as well as a representative dictionary according to a set of labeled training samples. However, interclass similarities and intraclass variances among input samples and learned features will generally weaken the representability of dictionary and the discrimination of feature vectors so as to degrade the classification performance. Therefore, how to explicitly represent them becomes an important issue. In this paper, we present a novel DDL framework with two-level low rank and group sparse decomposition model. In the first level, we learn a class-shared and several class-specific dictionaries, where a low rank and a group sparse regularization are, respectively, imposed on the corresponding feature matrices. In the second level, the class-specific feature matrix will be further decomposed into a low rank and a sparse matrix so that intraclass variances can be separated to concentrate the corresponding feature vectors. Extensive experimental results demonstrate the effectiveness of our model. Compared with the other state-of-the-arts on several popular image databases, our model can achieve a competitive or better performance in terms of the classification accuracy.
Nonlocal low-rank and sparse matrix decomposition for spectral CT reconstruction
NASA Astrophysics Data System (ADS)
Niu, Shanzhou; Yu, Gaohang; Ma, Jianhua; Wang, Jing
2018-02-01
Spectral computed tomography (CT) has been a promising technique in research and clinics because of its ability to produce improved energy resolution images with narrow energy bins. However, the narrow energy bin image is often affected by serious quantum noise because of the limited number of photons used in the corresponding energy bin. To address this problem, we present an iterative reconstruction method for spectral CT using nonlocal low-rank and sparse matrix decomposition (NLSMD), which exploits the self-similarity of patches that are collected in multi-energy images. Specifically, each set of patches can be decomposed into a low-rank component and a sparse component, and the low-rank component represents the stationary background over different energy bins, while the sparse component represents the rest of the different spectral features in individual energy bins. Subsequently, an effective alternating optimization algorithm was developed to minimize the associated objective function. To validate and evaluate the NLSMD method, qualitative and quantitative studies were conducted by using simulated and real spectral CT data. Experimental results show that the NLSMD method improves spectral CT images in terms of noise reduction, artifact suppression and resolution preservation.
NASA Astrophysics Data System (ADS)
Huang, Tsung-Ming; Lin, Wen-Wei; Tian, Heng; Chen, Guan-Hua
2018-03-01
Full spectrum of a large sparse ⊤-palindromic quadratic eigenvalue problem (⊤-PQEP) is considered arguably for the first time in this article. Such a problem is posed by calculation of surface Green's functions (SGFs) of mesoscopic transistors with a tremendous non-periodic cross-section. For this problem, general purpose eigensolvers are not efficient, nor is advisable to resort to the decimation method etc. to obtain the Wiener-Hopf factorization. After reviewing some rigorous understanding of SGF calculation from the perspective of ⊤-PQEP and nonlinear matrix equation, we present our new approach to this problem. In a nutshell, the unit disk where the spectrum of interest lies is broken down adaptively into pieces small enough that they each can be locally tackled by the generalized ⊤-skew-Hamiltonian implicitly restarted shift-and-invert Arnoldi (G⊤SHIRA) algorithm with suitable shifts and other parameters, and the eigenvalues missed by this divide-and-conquer strategy can be recovered thanks to the accurate estimation provided by our newly developed scheme. Notably the novel non-equivalence deflation is proposed to avoid as much as possible duplication of nearby known eigenvalues when a new shift of G⊤SHIRA is determined. We demonstrate our new approach by calculating the SGF of a realistic nanowire whose unit cell is described by a matrix of size 4000 × 4000 at the density functional tight binding level, corresponding to a 8 × 8nm2 cross-section. We believe that quantum transport simulation of realistic nano-devices in the mesoscopic regime will greatly benefit from this work.
Matrix decomposition graphics processing unit solver for Poisson image editing
NASA Astrophysics Data System (ADS)
Lei, Zhao; Wei, Li
2012-10-01
In recent years, gradient-domain methods have been widely discussed in the image processing field, including seamless cloning and image stitching. These algorithms are commonly carried out by solving a large sparse linear system: the Poisson equation. However, solving the Poisson equation is a computational and memory intensive task which makes it not suitable for real-time image editing. A new matrix decomposition graphics processing unit (GPU) solver (MDGS) is proposed to settle the problem. A matrix decomposition method is used to distribute the work among GPU threads, so that MDGS will take full advantage of the computing power of current GPUs. Additionally, MDGS is a hybrid solver (combines both the direct and iterative techniques) and has two-level architecture. These enable MDGS to generate identical solutions with those of the common Poisson methods and achieve high convergence rate in most cases. This approach is advantageous in terms of parallelizability, enabling real-time image processing, low memory-taken and extensive applications.
A modified dual-level algorithm for large-scale three-dimensional Laplace and Helmholtz equation
NASA Astrophysics Data System (ADS)
Li, Junpu; Chen, Wen; Fu, Zhuojia
2018-01-01
A modified dual-level algorithm is proposed in the article. By the help of the dual level structure, the fully-populated interpolation matrix on the fine level is transformed to a local supported sparse matrix to solve the highly ill-conditioning and excessive storage requirement resulting from fully-populated interpolation matrix. The kernel-independent fast multipole method is adopted to expediting the solving process of the linear equations on the coarse level. Numerical experiments up to 2-million fine-level nodes have successfully been achieved. It is noted that the proposed algorithm merely needs to place 2-3 coarse-level nodes in each wavelength per direction to obtain the reasonable solution, which almost down to the minimum requirement allowed by the Shannon's sampling theorem. In the real human head model example, it is observed that the proposed algorithm can simulate well computationally very challenging exterior high-frequency harmonic acoustic wave propagation up to 20,000 Hz.
Scalability improvements to NRLMOL for DFT calculations of large molecules
NASA Astrophysics Data System (ADS)
Diaz, Carlos Manuel
Advances in high performance computing (HPC) have provided a way to treat large, computationally demanding tasks using thousands of processors. With the development of more powerful HPC architectures, the need to create efficient and scalable code has grown more important. Electronic structure calculations are valuable in understanding experimental observations and are routinely used for new materials predictions. For the electronic structure calculations, the memory and computation time are proportional to the number of atoms. Memory requirements for these calculations scale as N2, where N is the number of atoms. While the recent advances in HPC offer platforms with large numbers of cores, the limited amount of memory available on a given node and poor scalability of the electronic structure code hinder their efficient usage of these platforms. This thesis will present some developments to overcome these bottlenecks in order to study large systems. These developments, which are implemented in the NRLMOL electronic structure code, involve the use of sparse matrix storage formats and the use of linear algebra using sparse and distributed matrices. These developments along with other related development now allow ground state density functional calculations using up to 25,000 basis functions and the excited state calculations using up to 17,000 basis functions while utilizing all cores on a node. An example on a light-harvesting triad molecule is described. Finally, future plans to further improve the scalability will be presented.
Newmark-Beta-FDTD method for super-resolution analysis of time reversal waves
NASA Astrophysics Data System (ADS)
Shi, Sheng-Bing; Shao, Wei; Ma, Jing; Jin, Congjun; Wang, Xiao-Hua
2017-09-01
In this work, a new unconditionally stable finite-difference time-domain (FDTD) method with the split-field perfectly matched layer (PML) is proposed for the analysis of time reversal (TR) waves. The proposed method is very suitable for multiscale problems involving microstructures. The spatial and temporal derivatives in this method are discretized by the central difference technique and Newmark-Beta algorithm, respectively, and the derivation results in the calculation of a banded-sparse matrix equation. Since the coefficient matrix keeps unchanged during the whole simulation process, the lower-upper (LU) decomposition of the matrix needs to be performed only once at the beginning of the calculation. Moreover, the reverse Cuthill-Mckee (RCM) technique, an effective preprocessing technique in bandwidth compression of sparse matrices, is used to improve computational efficiency. The super-resolution focusing of TR wave propagation in two- and three-dimensional spaces is included to validate the accuracy and efficiency of the proposed method.
Condition number estimation of preconditioned matrices.
Kushida, Noriyuki
2015-01-01
The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager's method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei's matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei's matrix, and matrices generated with the finite element method.
Blind compressed sensing image reconstruction based on alternating direction method
NASA Astrophysics Data System (ADS)
Liu, Qinan; Guo, Shuxu
2018-04-01
In order to solve the problem of how to reconstruct the original image under the condition of unknown sparse basis, this paper proposes an image reconstruction method based on blind compressed sensing model. In this model, the image signal is regarded as the product of a sparse coefficient matrix and a dictionary matrix. Based on the existing blind compressed sensing theory, the optimal solution is solved by the alternative minimization method. The proposed method solves the problem that the sparse basis in compressed sensing is difficult to represent, which restrains the noise and improves the quality of reconstructed image. This method ensures that the blind compressed sensing theory has a unique solution and can recover the reconstructed original image signal from a complex environment with a stronger self-adaptability. The experimental results show that the image reconstruction algorithm based on blind compressed sensing proposed in this paper can recover high quality image signals under the condition of under-sampling.
Blockwise conjugate gradient methods for image reconstruction in volumetric CT.
Qiu, W; Titley-Peloquin, D; Soleimani, M
2012-11-01
Cone beam computed tomography (CBCT) enables volumetric image reconstruction from 2D projection data and plays an important role in image guided radiation therapy (IGRT). Filtered back projection is still the most frequently used algorithm in applications. The algorithm discretizes the scanning process (forward projection) into a system of linear equations, which must then be solved to recover images from measured projection data. The conjugate gradients (CG) algorithm and its variants can be used to solve (possibly regularized) linear systems of equations Ax=b and linear least squares problems minx∥b-Ax∥2, especially when the matrix A is very large and sparse. Their applications can be found in a general CT context, but in tomography problems (e.g. CBCT reconstruction) they have not widely been used. Hence, CBCT reconstruction using the CG-type algorithm LSQR was implemented and studied in this paper. In CBCT reconstruction, the main computational challenge is that the matrix A usually is very large, and storing it in full requires an amount of memory well beyond the reach of commodity computers. Because of these memory capacity constraints, only a small fraction of the weighting matrix A is typically used, leading to a poor reconstruction. In this paper, to overcome this difficulty, the matrix A is partitioned and stored blockwise, and blockwise matrix-vector multiplications are implemented within LSQR. This implementation allows us to use the full weighting matrix A for CBCT reconstruction without further enhancing computer standards. Tikhonov regularization can also be implemented in this fashion, and can produce significant improvement in the reconstructed images. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Comparing implementations of penalized weighted least-squares sinogram restoration
Forthmann, Peter; Koehler, Thomas; Defrise, Michel; La Riviere, Patrick
2010-01-01
Purpose: A CT scanner measures the energy that is deposited in each channel of a detector array by x rays that have been partially absorbed on their way through the object. The measurement process is complex and quantitative measurements are always and inevitably associated with errors, so CT data must be preprocessed prior to reconstruction. In recent years, the authors have formulated CT sinogram preprocessing as a statistical restoration problem in which the goal is to obtain the best estimate of the line integrals needed for reconstruction from the set of noisy, degraded measurements. The authors have explored both penalized Poisson likelihood (PL) and penalized weighted least-squares (PWLS) objective functions. At low doses, the authors found that the PL approach outperforms PWLS in terms of resolution-noise tradeoffs, but at standard doses they perform similarly. The PWLS objective function, being quadratic, is more amenable to computational acceleration than the PL objective. In this work, the authors develop and compare two different methods for implementing PWLS sinogram restoration with the hope of improving computational performance relative to PL in the standard-dose regime. Sinogram restoration is still significant in the standard-dose regime since it can still outperform standard approaches and it allows for correction of effects that are not usually modeled in standard CT preprocessing. Methods: The authors have explored and compared two implementation strategies for PWLS sinogram restoration: (1) A direct matrix-inversion strategy based on the closed-form solution to the PWLS optimization problem and (2) an iterative approach based on the conjugate-gradient algorithm. Obtaining optimal performance from each strategy required modifying the naive off-the-shelf implementations of the algorithms to exploit the particular symmetry and sparseness of the sinogram-restoration problem. For the closed-form approach, the authors subdivided the large matrix inversion into smaller coupled problems and exploited sparseness to minimize matrix operations. For the conjugate-gradient approach, the authors exploited sparseness and preconditioned the problem to speed up convergence. Results: All methods produced qualitatively and quantitatively similar images as measured by resolution-variance tradeoffs and difference images. Despite the acceleration strategies, the direct matrix-inversion approach was found to be uncompetitive with iterative approaches, with a computational burden higher by an order of magnitude or more. The iterative conjugate-gradient approach, however, does appear promising, with computation times half that of the authors’ previous penalized-likelihood implementation. Conclusions: Iterative conjugate-gradient based PWLS sinogram restoration with careful matrix optimizations has computational advantages over direct matrix PWLS inversion and over penalized-likelihood sinogram restoration and can be considered a good alternative in standard-dose regimes. PMID:21158306
SparRec: An effective matrix completion framework of missing data imputation for GWAS
NASA Astrophysics Data System (ADS)
Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen
2016-10-01
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
Communication requirements of sparse Cholesky factorization with nested dissection ordering
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1989-01-01
Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
Performance issues for iterative solvers in device simulation
NASA Technical Reports Server (NTRS)
Fan, Qing; Forsyth, P. A.; Mcmacken, J. R. F.; Tang, Wei-Pai
1994-01-01
Due to memory limitations, iterative methods have become the method of choice for large scale semiconductor device simulation. However, it is well known that these methods still suffer from reliability problems. The linear systems which appear in numerical simulation of semiconductor devices are notoriously ill-conditioned. In order to produce robust algorithms for practical problems, careful attention must be given to many implementation issues. This paper concentrates on strategies for developing robust preconditioners. In addition, effective data structures and convergence check issues are also discussed. These algorithms are compared with a standard direct sparse matrix solver on a variety of problems.
Color normalization of histology slides using graph regularized sparse NMF
NASA Astrophysics Data System (ADS)
Sha, Lingdao; Schonfeld, Dan; Sethi, Amit
2017-03-01
Computer based automatic medical image processing and quantification are becoming popular in digital pathology. However, preparation of histology slides can vary widely due to differences in staining equipment, procedures and reagents, which can reduce the accuracy of algorithms that analyze their color and texture information. To re- duce the unwanted color variations, various supervised and unsupervised color normalization methods have been proposed. Compared with supervised color normalization methods, unsupervised color normalization methods have advantages of time and cost efficient and universal applicability. Most of the unsupervised color normaliza- tion methods for histology are based on stain separation. Based on the fact that stain concentration cannot be negative and different parts of the tissue absorb different stains, nonnegative matrix factorization (NMF), and particular its sparse version (SNMF), are good candidates for stain separation. However, most of the existing unsupervised color normalization method like PCA, ICA, NMF and SNMF fail to consider important information about sparse manifolds that its pixels occupy, which could potentially result in loss of texture information during color normalization. Manifold learning methods like Graph Laplacian have proven to be very effective in interpreting high-dimensional data. In this paper, we propose a novel unsupervised stain separation method called graph regularized sparse nonnegative matrix factorization (GSNMF). By considering the sparse prior of stain concentration together with manifold information from high-dimensional image data, our method shows better performance in stain color deconvolution than existing unsupervised color deconvolution methods, especially in keeping connected texture information. To utilized the texture information, we construct a nearest neighbor graph between pixels within a spatial area of an image based on their distances using heat kernal in lαβ space. The representation of a pixel in the stain density space is constrained to follow the feature distance of the pixel to pixels in the neighborhood graph. Utilizing color matrix transfer method with the stain concentrations found using our GSNMF method, the color normalization performance was also better than existing methods.
Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N
2015-10-13
We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-01-17
This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler
Bit error rate tester using fast parallel generation of linear recurring sequences
Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.
2003-05-06
A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Tensor Sparse Coding for Positive Definite Matrices.
Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikos
2013-08-02
In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for e.g., image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.
Tensor sparse coding for positive definite matrices.
Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2014-03-01
In recent years, there has been extensive research on sparse representation of vector-valued signals. In the matrix case, the data points are merely vectorized and treated as vectors thereafter (for example, image patches). However, this approach cannot be used for all matrices, as it may destroy the inherent structure of the data. Symmetric positive definite (SPD) matrices constitute one such class of signals, where their implicit structure of positive eigenvalues is lost upon vectorization. This paper proposes a novel sparse coding technique for positive definite matrices, which respects the structure of the Riemannian manifold and preserves the positivity of their eigenvalues, without resorting to vectorization. Synthetic and real-world computer vision experiments with region covariance descriptors demonstrate the need for and the applicability of the new sparse coding model. This work serves to bridge the gap between the sparse modeling paradigm and the space of positive definite matrices.
NASA Technical Reports Server (NTRS)
Whetstone, W. D.
1976-01-01
The functions and operating rules of the SPAR system, which is a group of computer programs used primarily to perform stress, buckling, and vibrational analyses of linear finite element systems, were given. The following subject areas were discussed: basic information, structure definition, format system matrix processors, utility programs, static solutions, stresses, sparse matrix eigensolver, dynamic response, graphics, and substructure processors.
Matrix Recipes for Hard Thresholding Methods
2012-11-07
have been proposed to approximate the solution. In [11], Donoho et al . demonstrate that, in the sparse approximation problem, under basic incoherence...inducing convex surrogate ‖ · ‖1 with provable guarantees for unique signal recovery. In the ARM problem, Fazel et al . [12] identified the nuclear norm...sparse recovery for all. Technical report, EPFL, 2011 . [25] N. Halko , P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic
Saa, Pedro A.; Nielsen, Lars K.
2016-01-01
Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155
An efficient classification method based on principal component and sparse representation.
Zhai, Lin; Fu, Shujun; Zhang, Caiming; Liu, Yunxian; Wang, Lu; Liu, Guohua; Yang, Mingqiang
2016-01-01
As an important application in optical imaging, palmprint recognition is interfered by many unfavorable factors. An effective fusion of blockwise bi-directional two-dimensional principal component analysis and grouping sparse classification is presented. The dimension reduction and normalizing are implemented by the blockwise bi-directional two-dimensional principal component analysis for palmprint images to extract feature matrixes, which are assembled into an overcomplete dictionary in sparse classification. A subspace orthogonal matching pursuit algorithm is designed to solve the grouping sparse representation. Finally, the classification result is gained by comparing the residual between testing and reconstructed images. Experiments are carried out on a palmprint database, and the results show that this method has better robustness against position and illumination changes of palmprint images, and can get higher rate of palmprint recognition.
Lim, Jun-Seok; Pang, Hee-Suk
2016-01-01
In this paper an [Formula: see text]-regularized recursive total least squares (RTLS) algorithm is considered for the sparse system identification. Although recursive least squares (RLS) has been successfully applied in sparse system identification, the estimation performance in RLS based algorithms becomes worse, when both input and output are contaminated by noise (the error-in-variables problem). We proposed an algorithm to handle the error-in-variables problem. The proposed [Formula: see text]-RTLS algorithm is an RLS like iteration using the [Formula: see text] regularization. The proposed algorithm not only gives excellent performance but also reduces the required complexity through the effective inversion matrix handling. Simulations demonstrate the superiority of the proposed [Formula: see text]-regularized RTLS for the sparse system identification setting.
Galgoczy, Roland; Pastor, Isabel; Colom, Adai; Giménez, Alicia; Mas, Francesc; Alcaraz, Jordi
2014-08-01
The design of 3D culture studies remains challenging due to the limited understanding of extracellular matrix (ECM)-dependent hindered diffusion and the lack of simple diffusivity assays. To address these limitations, we set up a cost-effective diffusivity assay based on a Transwell plate and the spectrophotometer of a Microplate Reader, which are readily accessible to cell biology groups. The spectrophotometer-based assay was used to assess the apparent diffusivity D of FITC-dextrans with molecular weight (4-70kDa) spanning the physiological range of signaling factors in a panel of acellular ECM gels including Matrigel, fibrin and type I collagen. Despite their technical differences, D data exhibited ∼15% relative difference with respect to FRAP measurements. Our results revealed that diffusion hindrance of small particles is controlled by the enhanced viscosity of the ECM gel in conformance with the Stokes-Einstein equation rather than by geometrical factors. Moreover, we provided a strong rationale that the enhanced ECM viscosity is largely contributed to by unassembled ECM macromolecules. We also reported that gels with the lowest D exhibited diffusion hindrance closest to the large physiologic hindrance of brain tissue, which has a typical pore size much smaller than ECM gels. Conversely, sparse gels (≤1mg/ml), which are extensively used in 3D cultures, failed to reproduce the hindered diffusion of tissues, thereby supporting that dense (but not sparse) ECM gels are suitable tissue surrogates in terms of macromolecular transport. Finally, the consequences of reduced diffusivity in terms of optimizing the design of 3D culture experiments were addressed in detail. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Jinting; Lu, Liqiao; Zhu, Fei
2018-01-01
Finite element (FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations (RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time (TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method (CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ (λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.
Okimoto, Gordon; Zeinalzadeh, Ashkan; Wenska, Tom; Loomis, Michael; Nation, James B; Fabre, Tiphaine; Tiirikainen, Maarit; Hernandez, Brenda; Chan, Owen; Wong, Linda; Kwee, Sandi
2016-01-01
Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of "sparse" left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single "sparsity" parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on "residual" data matrices that result from a given sparse approximation. We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired.
Adaptive OFDM Waveform Design for Spatio-Temporal-Sparsity Exploited STAP Radar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sen, Satyabrata
In this chapter, we describe a sparsity-based space-time adaptive processing (STAP) algorithm to detect a slowly moving target using an orthogonal frequency division multiplexing (OFDM) radar. The motivation of employing an OFDM signal is that it improves the target-detectability from the interfering signals by increasing the frequency diversity of the system. However, due to the addition of one extra dimension in terms of frequency, the adaptive degrees-of-freedom in an OFDM-STAP also increases. Therefore, to avoid the construction a fully adaptive OFDM-STAP, we develop a sparsity-based STAP algorithm. We observe that the interference spectrum is inherently sparse in the spatio-temporal domain,more » as the clutter responses occupy only a diagonal ridge on the spatio-temporal plane and the jammer signals interfere only from a few spatial directions. Hence, we exploit that sparsity to develop an efficient STAP technique that utilizes considerably lesser number of secondary data compared to the other existing STAP techniques, and produces nearly optimum STAP performance. In addition to designing the STAP filter, we optimally design the transmit OFDM signals by maximizing the output signal-to-interference-plus-noise ratio (SINR) in order to improve the STAP performance. The computation of output SINR depends on the estimated value of the interference covariance matrix, which we obtain by applying the sparse recovery algorithm. Therefore, we analytically assess the effects of the synthesized OFDM coefficients on the sparse recovery of the interference covariance matrix by computing the coherence measure of the sparse measurement matrix. Our numerical examples demonstrate the achieved STAP-performance due to sparsity-based technique and adaptive waveform design.« less
NoGOA: predicting noisy GO annotations using evidences and sparse representation.
Yu, Guoxian; Lu, Chang; Wang, Jun
2017-07-21
Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Preconditioner-free Wiener filtering with a dense noise matrix
NASA Astrophysics Data System (ADS)
Huffenberger, Kevin M.
2018-05-01
This work extends the Elsner & Wandelt (2013) iterative method for efficient, preconditioner-free Wiener filtering to cases in which the noise covariance matrix is dense, but can be decomposed into a sum whose parts are sparse in convenient bases. The new method, which uses multiple messenger fields, reproduces Wiener-filter solutions for test problems, and we apply it to a case beyond the reach of the Elsner & Wandelt (2013) method. We compute the Wiener-filter solution for a simulated Cosmic Microwave Background (CMB) map that contains spatially varying, uncorrelated noise, isotropic 1/f noise, and large-scale horizontal stripes (like those caused by atmospheric noise). We discuss simple extensions that can filter contaminated modes or inverse-noise-filter the data. These techniques help to address complications in the noise properties of maps from current and future generations of ground-based Microwave Background experiments, like Advanced ACTPol, Simons Observatory, and CMB-S4.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieper, Andreas; Kreutzer, Moritz; Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de
2016-11-15
We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need formore » matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.« less
Multi-color incomplete Cholesky conjugate gradient methods for vector computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poole, E.L.
1986-01-01
This research is concerned with the solution on vector computers of linear systems of equations. Ax = b, where A is a large, sparse symmetric positive definite matrix with non-zero elements lying only along a few diagonals of the matrix. The system is solved using the incomplete Cholesky conjugate gradient method (ICCG). Multi-color orderings are used of the unknowns in the linear system to obtain p-color matrices for which a no-fill block ICCG method is implemented on the CYBER 205 with O(N/p) length vector operations in both the decomposition of A and, more importantly, in the forward and back solvesmore » necessary at each iteration of the method. (N is the number of unknowns and p is a small constant). A p-colored matrix is a matrix that can be partitioned into a p x p block matrix where the diagonal blocks are diagonal matrices. The matrix is stored by diagonals and matrix multiplication by diagonals is used to carry out the decomposition of A and the forward and back solves. Additionally, if the vectors across adjacent blocks line up, then some of the overhead associated with vector startups can be eliminated in the matrix vector multiplication necessary at each conjugate gradient iteration. Necessary and sufficient conditions are given to determine which multi-color orderings of the unknowns correspond to p-color matrices, and a process is indicated for choosing multi-color orderings.« less
NASA Astrophysics Data System (ADS)
Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.
2018-04-01
The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
Condition Number Estimation of Preconditioned Matrices
Kushida, Noriyuki
2015-01-01
The present paper introduces a condition number estimation method for preconditioned matrices. The newly developed method provides reasonable results, while the conventional method which is based on the Lanczos connection gives meaningless results. The Lanczos connection based method provides the condition numbers of coefficient matrices of systems of linear equations with information obtained through the preconditioned conjugate gradient method. Estimating the condition number of preconditioned matrices is sometimes important when describing the effectiveness of new preconditionerers or selecting adequate preconditioners. Operating a preconditioner on a coefficient matrix is the simplest method of estimation. However, this is not possible for large-scale computing, especially if computation is performed on distributed memory parallel computers. This is because, the preconditioned matrices become dense, even if the original matrices are sparse. Although the Lanczos connection method can be used to calculate the condition number of preconditioned matrices, it is not considered to be applicable to large-scale problems because of its weakness with respect to numerical errors. Therefore, we have developed a robust and parallelizable method based on Hager’s method. The feasibility studies are curried out for the diagonal scaling preconditioner and the SSOR preconditioner with a diagonal matrix, a tri-daigonal matrix and Pei’s matrix. As a result, the Lanczos connection method contains around 10% error in the results even with a simple problem. On the other hand, the new method contains negligible errors. In addition, the newly developed method returns reasonable solutions when the Lanczos connection method fails with Pei’s matrix, and matrices generated with the finite element method. PMID:25816331
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-06-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
NASA Astrophysics Data System (ADS)
He, Xingyu; Tong, Ningning; Hu, Xiaowei
2018-01-01
Compressive sensing has been successfully applied to inverse synthetic aperture radar (ISAR) imaging of moving targets. By exploiting the block sparse structure of the target image, sparse solution for multiple measurement vectors (MMV) can be applied in ISAR imaging and a substantial performance improvement can be achieved. As an effective sparse recovery method, sparse Bayesian learning (SBL) for MMV involves a matrix inverse at each iteration. Its associated computational complexity grows significantly with the problem size. To address this problem, we develop a fast inverse-free (IF) SBL method for MMV. A relaxed evidence lower bound (ELBO), which is computationally more amiable than the traditional ELBO used by SBL, is obtained by invoking fundamental property for smooth functions. A variational expectation-maximization scheme is then employed to maximize the relaxed ELBO, and a computationally efficient IF-MSBL algorithm is proposed. Numerical results based on simulated and real data show that the proposed method can reconstruct row sparse signal accurately and obtain clear superresolution ISAR images. Moreover, the running time and computational complexity are reduced to a great extent compared with traditional SBL methods.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-01-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
Matrix computations in MACSYMA
NASA Technical Reports Server (NTRS)
Wang, P. S.
1977-01-01
Facilities built into MACSYMA for manipulating matrices with numeric or symbolic entries are described. Computations will be done exactly, keeping symbols as symbols. Topics discussed include how to form a matrix and create other matrices by transforming existing matrices within MACSYMA; arithmetic and other computation with matrices; and user control of computational processes through the use of optional variables. Two algorithms designed for sparse matrices are given. The computing times of several different ways to compute the determinant of a matrix are compared.
Spectral Calculation of ICRF Wave Propagation and Heating in 2-D Using Massively Parallel Computers
NASA Astrophysics Data System (ADS)
Jaeger, E. F.; D'Azevedo, E.; Berry, L. A.; Carter, M. D.; Batchelor, D. B.
2000-10-01
Spectral calculations of ICRF wave propagation in plasmas have the natural advantage that they require no assumption regarding the smallness of the ion Larmor radius ρ relative to wavelength λ. Results are therefore applicable to all orders in k_bot ρ where k_bot = 2π/λ. But because all modes in the spectral representation are coupled, the solution requires inversion of a large dense matrix. In contrast, finite difference algorithms involve only matrices that are sparse and banded. Thus, spectral calculations of wave propagation and heating in tokamak plasmas have so far been limited to 1-D. In this paper, we extend the spectral method to 2-D by taking advantage of new matrix inversion techniques that utilize massively parallel computers. By spreading the dense matrix over 576 processors on the ORNL IBM RS/6000 SP supercomputer, we are able to solve up to 120,000 coupled complex equations requiring 230 GBytes of memory and achieving over 500 Gflops/sec. Initial results for ASDEX and NSTX will be presented using up to 200 modes in both the radial and vertical dimensions.
A fast object-oriented Matlab implementation of the Reproducing Kernel Particle Method
NASA Astrophysics Data System (ADS)
Barbieri, Ettore; Meo, Michele
2012-05-01
Novel numerical methods, known as Meshless Methods or Meshfree Methods and, in a wider perspective, Partition of Unity Methods, promise to overcome most of disadvantages of the traditional finite element techniques. The absence of a mesh makes meshfree methods very attractive for those problems involving large deformations, moving boundaries and crack propagation. However, meshfree methods still have significant limitations that prevent their acceptance among researchers and engineers, namely the computational costs. This paper presents an in-depth analysis of computational techniques to speed-up the computation of the shape functions in the Reproducing Kernel Particle Method and Moving Least Squares, with particular focus on their bottlenecks, like the neighbour search, the inversion of the moment matrix and the assembly of the stiffness matrix. The paper presents numerous computational solutions aimed at a considerable reduction of the computational times: the use of kd-trees for the neighbour search, sparse indexing of the nodes-points connectivity and, most importantly, the explicit and vectorized inversion of the moment matrix without using loops and numerical routines.
Negre, Christian F A; Mniszewski, Susan M; Cawkwell, Marc J; Bock, Nicolas; Wall, Michael E; Niklasson, Anders M N
2016-07-12
We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive, iterative refinement of an initial guess of Z (inverse square root of the overlap matrix S). The initial guess of Z is obtained beforehand by using either an approximate divide-and-conquer technique or dynamical methods, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under the incomplete, approximate, iterative refinement of Z. Linear-scaling performance is obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables efficient shared-memory parallelization. As we show in this article using self-consistent density-functional-based tight-binding MD, our approach is faster than conventional methods based on the diagonalization of overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4158-atom water-solvated polyalanine system, we find an average speedup factor of 122 for the computation of Z in each MD step.
Negre, Christian F. A; Mniszewski, Susan M.; Cawkwell, Marc Jon; ...
2016-06-06
We present a reduced complexity algorithm to compute the inverse overlap factors required to solve the generalized eigenvalue problem in a quantum-based molecular dynamics (MD) simulation. Our method is based on the recursive iterative re nement of an initial guess Z of the inverse overlap matrix S. The initial guess of Z is obtained beforehand either by using an approximate divide and conquer technique or dynamically, propagated within an extended Lagrangian dynamics from previous MD time steps. With this formulation, we achieve long-term stability and energy conservation even under incomplete approximate iterative re nement of Z. Linear scaling performance ismore » obtained using numerically thresholded sparse matrix algebra based on the ELLPACK-R sparse matrix data format, which also enables e cient shared memory parallelization. As we show in this article using selfconsistent density functional based tight-binding MD, our approach is faster than conventional methods based on the direct diagonalization of the overlap matrix S for systems as small as a few hundred atoms, substantially accelerating quantum-based simulations even for molecular structures of intermediate size. For a 4,158 atom water-solvated polyalanine system we nd an average speedup factor of 122 for the computation of Z in each MD step.« less
Label consistent K-SVD: learning a discriminative dictionary for recognition.
Jiang, Zhuolin; Lin, Zhe; Davis, Larry S
2013-11-01
A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.
Finite-size analysis of the detectability limit of the stochastic block model
NASA Astrophysics Data System (ADS)
Young, Jean-Gabriel; Desrosiers, Patrick; Hébert-Dufresne, Laurent; Laurence, Edward; Dubé, Louis J.
2017-06-01
It has been shown in recent years that the stochastic block model is sometimes undetectable in the sparse limit, i.e., that no algorithm can identify a partition correlated with the partition used to generate an instance, if the instance is sparse enough and infinitely large. In this contribution, we treat the finite case explicitly, using arguments drawn from information theory and statistics. We give a necessary condition for finite-size detectability in the general SBM. We then distinguish the concept of average detectability from the concept of instance-by-instance detectability and give explicit formulas for both definitions. Using these formulas, we prove that there exist large equivalence classes of parameters, where widely different network ensembles are equally detectable with respect to our definitions of detectability. In an extensive case study, we investigate the finite-size detectability of a simplified variant of the SBM, which encompasses a number of important models as special cases. These models include the symmetric SBM, the planted coloring model, and more exotic SBMs not previously studied. We conclude with three appendices, where we study the interplay of noise and detectability, establish a connection between our information-theoretic approach and random matrix theory, and provide proofs of some of the more technical results.
Subspace aware recovery of low rank and jointly sparse signals
Biswas, Sampurna; Dasgupta, Soura; Mudumbai, Raghuraman; Jacob, Mathews
2017-01-01
We consider the recovery of a matrix X, which is simultaneously low rank and joint sparse, from few measurements of its columns using a two-step algorithm. Each column of X is measured using a combination of two measurement matrices; one which is the same for every column, while the the second measurement matrix varies from column to column. The recovery proceeds by first estimating the row subspace vectors from the measurements corresponding to the common matrix. The estimated row subspace vectors are then used to recover X from all the measurements using a convex program of joint sparsity minimization. Our main contribution is to provide sufficient conditions on the measurement matrices that guarantee the recovery of such a matrix using the above two-step algorithm. The results demonstrate quite significant savings in number of measurements when compared to the standard multiple measurement vector (MMV) scheme, which assumes same time invariant measurement pattern for all the time frames. We illustrate the impact of the sampling pattern on reconstruction quality using breath held cardiac cine MRI and cardiac perfusion MRI data, while the utility of the algorithm to accelerate the acquisition is demonstrated on MR parameter mapping. PMID:28630889
Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition
Ong, Frank; Lustig, Michael
2016-01-01
We present a natural generalization of the recent low rank + sparse matrix decomposition and consider the decomposition of matrices into components of multiple scales. Such decomposition is well motivated in practice as data matrices often exhibit local correlations in multiple scales. Concretely, we propose a multi-scale low rank modeling that represents a data matrix as a sum of block-wise low rank matrices with increasing scales of block sizes. We then consider the inverse problem of decomposing the data matrix into its multi-scale low rank components and approach the problem via a convex formulation. Theoretically, we show that under various incoherence conditions, the convex program recovers the multi-scale low rank components either exactly or approximately. Practically, we provide guidance on selecting the regularization parameters and incorporate cycle spinning to reduce blocking artifacts. Experimentally, we show that the multi-scale low rank decomposition provides a more intuitive decomposition than conventional low rank methods and demonstrate its effectiveness in four applications, including illumination normalization for face images, motion separation for surveillance videos, multi-scale modeling of the dynamic contrast enhanced magnetic resonance imaging and collaborative filtering exploiting age information. PMID:28450978
Macquarrie, K T B; Mayer, K U; Jin, B; Spiessl, S M
2010-03-01
Redox evolution in sparsely fractured crystalline rocks is a key, and largely unresolved, issue when assessing the geochemical suitability of deep geological repositories for nuclear waste. Redox zonation created by the influx of oxygenated waters has previously been simulated using reactive transport models that have incorporated a variety of processes, resulting in predictions for the depth of oxygen penetration that may vary greatly. An assessment and direct comparison of the various underlying conceptual models are therefore needed. In this work a reactive transport model that considers multiple processes in an integrated manner is used to investigate the ingress of oxygen for both single fracture and fracture zone scenarios. It is shown that the depth of dissolved oxygen migration is greatly influenced by the a priori assumptions that are made in the conceptual models. For example, the ability of oxygen to access and react with minerals in the rock matrix may be of paramount importance for single fracture conceptual models. For fracture zone systems, the abundance and reactivity of minerals within the fractures and thin matrix slabs between the fractures appear to provide key controls on O(2) attenuation. The findings point to the need for improved understanding of the coupling between the key transport-reaction feedbacks to determine which conceptual models are most suitable and to provide guidance for which parameters should be targeted in field and laboratory investigations. Copyright 2009 Elsevier B.V. All rights reserved.
Multi-GPU implementation of a VMAT treatment plan optimization algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Zhen, E-mail: Zhen.Tian@UTSouthwestern.edu, E-mail: Xun.Jia@UTSouthwestern.edu, E-mail: Steve.Jiang@UTSouthwestern.edu; Folkerts, Michael; Tan, Jun
Purpose: Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU’s relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors’ group, on a multi-GPU platform tomore » solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. Methods: The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors’ method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H and N) cancer case is then used to validate the authors’ method. The authors also compare their multi-GPU implementation with three different single GPU implementation strategies, i.e., truncating DDC matrix (S1), repeatedly transferring DDC matrix between CPU and GPU (S2), and porting computations involving DDC matrix to CPU (S3), in terms of both plan quality and computational efficiency. Two more H and N patient cases and three prostate cases are used to demonstrate the advantages of the authors’ method. Results: The authors’ multi-GPU implementation can finish the optimization process within ∼1 min for the H and N patient case. S1 leads to an inferior plan quality although its total time was 10 s shorter than the multi-GPU implementation due to the reduced matrix size. S2 and S3 yield the same plan quality as the multi-GPU implementation but take ∼4 and ∼6 min, respectively. High computational efficiency was consistently achieved for the other five patient cases tested, with VMAT plans of clinically acceptable quality obtained within 23–46 s. Conversely, to obtain clinically comparable or acceptable plans for all six of these VMAT cases that the authors have tested in this paper, the optimization time needed in a commercial TPS system on CPU was found to be in an order of several minutes. Conclusions: The results demonstrate that the multi-GPU implementation of the authors’ column-generation-based VMAT optimization can handle the large-scale VMAT optimization problem efficiently without sacrificing plan quality. The authors’ study may serve as an example to shed some light on other large-scale medical physics problems that require multi-GPU techniques.« less
Tang, Shiming; Zhang, Yimeng; Li, Zhihao; Li, Ming; Liu, Fang; Jiang, Hongfei; Lee, Tai Sing
2018-04-26
One general principle of sensory information processing is that the brain must optimize efficiency by reducing the number of neurons that process the same information. The sparseness of the sensory representations in a population of neurons reflects the efficiency of the neural code. Here, we employ large-scale two-photon calcium imaging to examine the responses of a large population of neurons within the superficial layers of area V1 with single-cell resolution, while simultaneously presenting a large set of natural visual stimuli, to provide the first direct measure of the population sparseness in awake primates. The results show that only 0.5% of neurons respond strongly to any given natural image - indicating a ten-fold increase in the inferred sparseness over previous measurements. These population activities are nevertheless necessary and sufficient to discriminate visual stimuli with high accuracy, suggesting that the neural code in the primary visual cortex is both super-sparse and highly efficient. © 2018, Tang et al.
Compressed sensing for high-resolution nonlipid suppressed 1 H FID MRSI of the human brain at 9.4T.
Nassirpour, Sahar; Chang, Paul; Avdievitch, Nikolai; Henning, Anke
2018-04-29
The aim of this study was to apply compressed sensing to accelerate the acquisition of high resolution metabolite maps of the human brain using a nonlipid suppressed ultra-short TR and TE 1 H FID MRSI sequence at 9.4T. X-t sparse compressed sensing reconstruction was optimized for nonlipid suppressed 1 H FID MRSI data. Coil-by-coil x-t sparse reconstruction was compared with SENSE x-t sparse and low rank reconstruction. The effect of matrix size and spatial resolution on the achievable acceleration factor was studied. Finally, in vivo metabolite maps with different acceleration factors of 2, 4, 5, and 10 were acquired and compared. Coil-by-coil x-t sparse compressed sensing reconstruction was not able to reliably recover the nonlipid suppressed data, rather a combination of parallel and sparse reconstruction was necessary (SENSE x-t sparse). For acceleration factors of up to 5, both the low-rank and the compressed sensing methods were able to reconstruct the data comparably well (root mean squared errors [RMSEs] ≤ 10.5% for Cre). However, the reconstruction time of the low rank algorithm was drastically longer than compressed sensing. Using the optimized compressed sensing reconstruction, acceleration factors of 4 or 5 could be reached for the MRSI data with a matrix size of 64 × 64. For lower spatial resolutions, an acceleration factor of up to R∼4 was successfully achieved. By tailoring the reconstruction scheme to the nonlipid suppressed data through parameter optimization and performance evaluation, we present high resolution (97 µL voxel size) accelerated in vivo metabolite maps of the human brain acquired at 9.4T within scan times of 3 to 3.75 min. © 2018 International Society for Magnetic Resonance in Medicine.
Algorithms and software for solving finite element equations on serial and parallel architectures
NASA Technical Reports Server (NTRS)
Chu, Eleanor; George, Alan
1988-01-01
The primary objective was to compare the performance of state-of-the-art techniques for solving sparse systems with those that are currently available in the Computational Structural Mechanics (MSC) testbed. One of the first tasks was to become familiar with the structure of the testbed, and to install some or all of the SPARSPAK package in the testbed. A brief overview of the CSM Testbed software and its usage is presented. An overview of the sparse matrix research for the Testbed currently employed in the CSM Testbed is given. An interface which was designed and implemented as a research tool for installing and appraising new matrix processors in the CSM Testbed is described. The results of numerical experiments performed in solving a set of testbed demonstration problems using the processor SPK and other experimental processors are contained.
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Improved parallel data partitioning by nested dissection with applications to information retrieval.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar
The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nagasaka, Y; Matsuoka, S; Azad, A
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Das, Kiranmoy; Daniels, Michael J.
2014-01-01
Summary Estimation of the covariance structure for irregular sparse longitudinal data has been studied by many authors in recent years but typically using fully parametric specifications. In addition, when data are collected from several groups over time, it is known that assuming the same or completely different covariance matrices over groups can lead to loss of efficiency and/or bias. Nonparametric approaches have been proposed for estimating the covariance matrix for regular univariate longitudinal data by sharing information across the groups under study. For the irregular case, with longitudinal measurements that are bivariate or multivariate, modeling becomes more difficult. In this article, to model bivariate sparse longitudinal data from several groups, we propose a flexible covariance structure via a novel matrix stick-breaking process for the residual covariance structure and a Dirichlet process mixture of normals for the random effects. Simulation studies are performed to investigate the effectiveness of the proposed approach over more traditional approaches. We also analyze a subset of Framingham Heart Study data to examine how the blood pressure trajectories and covariance structures differ for the patients from different BMI groups (high, medium and low) at baseline. PMID:24400941
Zhang, Zhilin; Jung, Tzyy-Ping; Makeig, Scott; Rao, Bhaskar D
2013-02-01
Fetal ECG (FECG) telemonitoring is an important branch in telemedicine. The design of a telemonitoring system via a wireless body area network with low energy consumption for ambulatory use is highly desirable. As an emerging technique, compressed sensing (CS) shows great promise in compressing/reconstructing data with low energy consumption. However, due to some specific characteristics of raw FECG recordings such as nonsparsity and strong noise contamination, current CS algorithms generally fail in this application. This paper proposes to use the block sparse Bayesian learning framework to compress/reconstruct nonsparse raw FECG recordings. Experimental results show that the framework can reconstruct the raw recordings with high quality. Especially, the reconstruction does not destroy the interdependence relation among the multichannel recordings. This ensures that the independent component analysis decomposition of the reconstructed recordings has high fidelity. Furthermore, the framework allows the use of a sparse binary sensing matrix with much fewer nonzero entries to compress recordings. Particularly, each column of the matrix can contain only two nonzero entries. This shows that the framework, compared to other algorithms such as current CS algorithms and wavelet algorithms, can greatly reduce code execution in CPU in the data compression stage.
Technical note: an R package for fitting sparse neural networks with application in animal breeding.
Wang, Yangfan; Mi, Xue; Rosa, Guilherme J M; Chen, Zhihui; Lin, Ping; Wang, Shi; Bao, Zhenmin
2018-05-04
Neural networks (NNs) have emerged as a new tool for genomic selection (GS) in animal breeding. However, the properties of NN used in GS for the prediction of phenotypic outcomes are not well characterized due to the problem of over-parameterization of NN and difficulties in using whole-genome marker sets as high-dimensional NN input. In this note, we have developed an R package called snnR that finds an optimal sparse structure of a NN by minimizing the square error subject to a penalty on the L1-norm of the parameters (weights and biases), therefore solving the problem of over-parameterization in NN. We have also tested some models fitted in the snnR package to demonstrate their feasibility and effectiveness to be used in several cases as examples. In comparison of snnR to the R package brnn (the Bayesian regularized single layer NNs), with both using the entries of a genotype matrix or a genomic relationship matrix as inputs, snnR has greatly improved the computational efficiency and the prediction ability for the GS in animal breeding because snnR implements a sparse NN with many hidden layers.
Comparison of two matrix data structures for advanced CSM testbed applications
NASA Technical Reports Server (NTRS)
Regelbrugge, M. E.; Brogan, F. A.; Nour-Omid, B.; Rankin, C. C.; Wright, M. A.
1989-01-01
The first section describes data storage schemes presently used by the Computational Structural Mechanics (CSM) testbed sparse matrix facilities and similar skyline (profile) matrix facilities. The second section contains a discussion of certain features required for the implementation of particular advanced CSM algorithms, and how these features might be incorporated into the data storage schemes described previously. The third section presents recommendations, based on the discussions of the prior sections, for directing future CSM testbed development to provide necessary matrix facilities for advanced algorithm implementation and use. The objective is to lend insight into the matrix structures discussed and to help explain the process of evaluating alternative matrix data structures and utilities for subsequent use in the CSM testbed.
Mesoscopic Rigid Body Modelling of the Extracellular Matrix Self-Assembly.
Wong, Hua; Prévoteau-Jonquet, Jessica; Baud, Stéphanie; Dauchez, Manuel; Belloy, Nicolas
2018-06-11
The extracellular matrix (ECM) plays an important role in supporting tissues and organs. It even has a functional role in morphogenesis and differentiation by acting as a source of active molecules (matrikines). Many diseases are linked to dysfunction of ECM components and fragments or changes in their structures. As such it is a prime target for drugs. Because of technological limitations for observations at mesoscopic scales, the precise structural organisation of the ECM is not well-known, with sparse or fuzzy experimental observables. Based on the Unity3D game and physics engines, along with rigid body dynamics, we propose a virtual sandbox to model large biological molecules as dynamic chains of rigid bodies interacting together to gain insight into ECM components behaviour in the mesoscopic range. We have preliminary results showing how parameters such as fibre flexibility or the nature and number of interactions between molecules can induce different structures in the basement membrane. Using the Unity3D game engine and virtual reality headset coupled with haptic controllers, we immerse the user inside the corresponding simulation. Untrained users are able to navigate a complex virtual sandbox crowded with large biomolecules models in a matter of seconds.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smed, T.
Traditional eigenvalue sensitivity for power systems requires the formulation of the system matrix, which lacks sparsity. In this paper, a new sensitivity analysis, derived for a sparse formulation, is presented. Variables that are computed as intermediate results in established eigen value programs for power systems, but not used further, are given a new interpretation. The effect of virtually any control action can be assessed based on a single eigenvalue-eigenvector calculation. In particular, the effect of active and reactive power modulation can be found as a multiplication of two or three complex numbers. The method is illustrated in an example formore » a large power system when applied to the control design for an HVDC-link.« less
Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Dense and Sparse Matrix Operations on the Cell Processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel W.; Shalf, John; Oliker, Leonid
2005-05-01
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
Multivariable frequency domain identification via 2-norm minimization
NASA Technical Reports Server (NTRS)
Bayard, David S.
1992-01-01
The author develops a computational approach to multivariable frequency domain identification, based on 2-norm minimization. In particular, a Gauss-Newton (GN) iteration is developed to minimize the 2-norm of the error between frequency domain data and a matrix fraction transfer function estimate. To improve the global performance of the optimization algorithm, the GN iteration is initialized using the solution to a particular sequentially reweighted least squares problem, denoted as the SK iteration. The least squares problems which arise from both the SK and GN iterations are shown to involve sparse matrices with identical block structure. A sparse matrix QR factorization method is developed to exploit the special block structure, and to efficiently compute the least squares solution. A numerical example involving the identification of a multiple-input multiple-output (MIMO) plant having 286 unknown parameters is given to illustrate the effectiveness of the algorithm.
Research on sparse feature matching of improved RANSAC algorithm
NASA Astrophysics Data System (ADS)
Kong, Xiangsi; Zhao, Xian
2018-04-01
In this paper, a sparse feature matching method based on modified RANSAC algorithm is proposed to improve the precision and speed. Firstly, the feature points of the images are extracted using the SIFT algorithm. Then, the image pair is matched roughly by generating SIFT feature descriptor. At last, the precision of image matching is optimized by the modified RANSAC algorithm,. The RANSAC algorithm is improved from three aspects: instead of the homography matrix, this paper uses the fundamental matrix generated by the 8 point algorithm as the model; the sample is selected by a random block selecting method, which ensures the uniform distribution and the accuracy; adds sequential probability ratio test(SPRT) on the basis of standard RANSAC, which cut down the overall running time of the algorithm. The experimental results show that this method can not only get higher matching accuracy, but also greatly reduce the computation and improve the matching speed.
3D Reconstruction of human bones based on dictionary learning.
Zhang, Binkai; Wang, Xiang; Liang, Xiao; Zheng, Jinjin
2017-11-01
An effective method for reconstructing a 3D model of human bones from computed tomography (CT) image data based on dictionary learning is proposed. In this study, the dictionary comprises the vertices of triangular meshes, and the sparse coefficient matrix indicates the connectivity information. For better reconstruction performance, we proposed a balance coefficient between the approximation and regularisation terms and a method for optimisation. Moreover, we applied a local updating strategy and a mesh-optimisation method to update the dictionary and the sparse matrix, respectively. The two updating steps are iterated alternately until the objective function converges. Thus, a reconstructed mesh could be obtained with high accuracy and regularisation. The experimental results show that the proposed method has the potential to obtain high precision and high-quality triangular meshes for rapid prototyping, medical diagnosis, and tissue engineering. Copyright © 2017 IPEM. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Semenov, Alexander; Babikov, Dmitri
2013-11-01
We formulated the mixed quantum/classical theory for rotationally and vibrationally inelastic scattering process in the diatomic molecule + atom system. Two versions of theory are presented, first in the space-fixed and second in the body-fixed reference frame. First version is easy to derive and the resultant equations of motion are transparent, but the state-to-state transition matrix is complex-valued and dense. Such calculations may be computationally demanding for heavier molecules and/or higher temperatures, when the number of accessible channels becomes large. In contrast, the second version of theory requires some tedious derivations and the final equations of motion are rather complicated (not particularly intuitive). However, the state-to-state transitions are driven by real-valued sparse matrixes of much smaller size. Thus, this formulation is the method of choice from the computational point of view, while the space-fixed formulation can serve as a test of the body-fixed equations of motion, and the code. Rigorous numerical tests were carried out for a model system to ensure that all equations, matrixes, and computer codes in both formulations are correct.
NASA Technical Reports Server (NTRS)
Bless, Robert R.
1991-01-01
A time-domain finite element method is developed for optimal control problems. The theory derived is general enough to handle a large class of problems including optimal control problems that are continuous in the states and controls, problems with discontinuities in the states and/or system equations, problems with control inequality constraints, problems with state inequality constraints, or problems involving any combination of the above. The theory is developed in such a way that no numerical quadrature is necessary regardless of the degree of nonlinearity in the equations. Also, the same shape functions may be employed for every problem because all strong boundary conditions are transformed into natural or weak boundary conditions. In addition, the resulting nonlinear algebraic equations are very sparse. Use of sparse matrix solvers allows for the rapid and accurate solution of very difficult optimization problems. The formulation is applied to launch-vehicle trajectory optimization problems, and results show that real-time optimal guidance is realizable with this method. Finally, a general problem solving environment is created for solving a large class of optimal control problems. The algorithm uses both FORTRAN and a symbolic computation program to solve problems with a minimum of user interaction. The use of symbolic computation eliminates the need for user-written subroutines which greatly reduces the setup time for solving problems.
Fast Component Pursuit for Large-Scale Inverse Covariance Estimation.
Han, Lei; Zhang, Yu; Zhang, Tong
2016-08-01
The maximum likelihood estimation (MLE) for the Gaussian graphical model, which is also known as the inverse covariance estimation problem, has gained increasing interest recently. Most existing works assume that inverse covariance estimators contain sparse structure and then construct models with the ℓ 1 regularization. In this paper, different from existing works, we study the inverse covariance estimation problem from another perspective by efficiently modeling the low-rank structure in the inverse covariance, which is assumed to be a combination of a low-rank part and a diagonal matrix. One motivation for this assumption is that the low-rank structure is common in many applications including the climate and financial analysis, and another one is that such assumption can reduce the computational complexity when computing its inverse. Specifically, we propose an efficient COmponent Pursuit (COP) method to obtain the low-rank part, where each component can be sparse. For optimization, the COP method greedily learns a rank-one component in each iteration by maximizing the log-likelihood. Moreover, the COP algorithm enjoys several appealing properties including the existence of an efficient solution in each iteration and the theoretical guarantee on the convergence of this greedy approach. Experiments on large-scale synthetic and real-world datasets including thousands of millions variables show that the COP method is faster than the state-of-the-art techniques for the inverse covariance estimation problem when achieving comparable log-likelihood on test data.
Decentralized modal identification using sparse blind source separation
NASA Astrophysics Data System (ADS)
Sadhu, A.; Hazra, B.; Narasimhan, S.; Pandey, M. D.
2011-12-01
Popular ambient vibration-based system identification methods process information collected from a dense array of sensors centrally to yield the modal properties. In such methods, the need for a centralized processing unit capable of satisfying large memory and processing demands is unavoidable. With the advent of wireless smart sensor networks, it is now possible to process information locally at the sensor level, instead. The information at the individual sensor level can then be concatenated to obtain the global structure characteristics. A novel decentralized algorithm based on wavelet transforms to infer global structure mode information using measurements obtained using a small group of sensors at a time is proposed in this paper. The focus of the paper is on algorithmic development, while the actual hardware and software implementation is not pursued here. The problem of identification is cast within the framework of under-determined blind source separation invoking transformations of measurements to the time-frequency domain resulting in a sparse representation. The partial mode shape coefficients so identified are then combined to yield complete modal information. The transformations are undertaken using stationary wavelet packet transform (SWPT), yielding a sparse representation in the wavelet domain. Principal component analysis (PCA) is then performed on the resulting wavelet coefficients, yielding the partial mixing matrix coefficients from a few measurement channels at a time. This process is repeated using measurements obtained from multiple sensor groups, and the results so obtained from each group are concatenated to obtain the global modal characteristics of the structure.
NASA Astrophysics Data System (ADS)
Susmikanti, Mike; Dewayatna, Winter; Sulistyo, Yos
2014-09-01
One of the research activities in support of commercial radioisotope production program is a safety research on target FPM (Fission Product Molybdenum) irradiation. FPM targets form a tube made of stainless steel which contains nuclear-grade high-enrichment uranium. The FPM irradiation tube is intended to obtain fission products. Fission materials such as Mo99 used widely the form of kits in the medical world. The neutronics problem is solved using first-order perturbation theory derived from the diffusion equation for four groups. In contrast, Mo isotopes have longer half-lives, about 3 days (66 hours), so the delivery of radioisotopes to consumer centers and storage is possible though still limited. The production of this isotope potentially gives significant economic value. The criticality and flux in multigroup diffusion model was calculated for various irradiation positions and uranium contents. This model involves complex computation, with large and sparse matrix system. Several parallel algorithms have been developed for the sparse and large matrix solution. In this paper, a successive over-relaxation (SOR) algorithm was implemented for the calculation of reactivity coefficients which can be done in parallel. Previous works performed reactivity calculations serially with Gauss-Seidel iteratives. The parallel method can be used to solve multigroup diffusion equation system and calculate the criticality and reactivity coefficients. In this research a computer code was developed to exploit parallel processing to perform reactivity calculations which were to be used in safety analysis. The parallel processing in the multicore computer system allows the calculation to be performed more quickly. This code was applied for the safety limits calculation of irradiated FPM targets containing highly enriched uranium. The results of calculations neutron show that for uranium contents of 1.7676 g and 6.1866 g (× 106 cm-1) in a tube, their delta reactivities are the still within safety limits; however, for 7.9542 g and 8.838 g (× 106 cm-1) the limits were exceeded.
NASA Astrophysics Data System (ADS)
Guda, A. A.; Guda, S. A.; Soldatov, M. A.; Lomachenko, K. A.; Bugaev, A. L.; Lamberti, C.; Gawelda, W.; Bressler, C.; Smolentsev, G.; Soldatov, A. V.; Joly, Y.
2016-05-01
Finite difference method (FDM) implemented in the FDMNES software [Phys. Rev. B, 2001, 63, 125120] was revised. Thorough analysis shows, that the calculated diagonal in the FDM matrix consists of about 96% zero elements. Thus a sparse solver would be more suitable for the problem instead of traditional Gaussian elimination for the diagonal neighbourhood. We have tried several iterative sparse solvers and the direct one MUMPS solver with METIS ordering turned out to be the best. Compared to the Gaussian solver present method is up to 40 times faster and allows XANES simulations for complex systems already on personal computers. We show applicability of the software for metal-organic [Fe(bpy)3]2+ complex both for low spin and high spin states populated after laser excitation.
A Sparse Bayesian Approach for Forward-Looking Superresolution Radar Imaging
Zhang, Yin; Zhang, Yongchao; Huang, Yulin; Yang, Jianyu
2017-01-01
This paper presents a sparse superresolution approach for high cross-range resolution imaging of forward-looking scanning radar based on the Bayesian criterion. First, a novel forward-looking signal model is established as the product of the measurement matrix and the cross-range target distribution, which is more accurate than the conventional convolution model. Then, based on the Bayesian criterion, the widely-used sparse regularization is considered as the penalty term to recover the target distribution. The derivation of the cost function is described, and finally, an iterative expression for minimizing this function is presented. Alternatively, this paper discusses how to estimate the single parameter of Gaussian noise. With the advantage of a more accurate model, the proposed sparse Bayesian approach enjoys a lower model error. Meanwhile, when compared with the conventional superresolution methods, the proposed approach shows high cross-range resolution and small location error. The superresolution results for the simulated point target, scene data, and real measured data are presented to demonstrate the superior performance of the proposed approach. PMID:28604583
Sparse dictionary learning for resting-state fMRI analysis
NASA Astrophysics Data System (ADS)
Lee, Kangjoo; Han, Paul Kyu; Ye, Jong Chul
2011-09-01
Recently, there has been increased interest in the usage of neuroimaging techniques to investigate what happens in the brain at rest. Functional imaging studies have revealed that the default-mode network activity is disrupted in Alzheimer's disease (AD). However, there is no consensus, as yet, on the choice of analysis method for the application of resting-state analysis for disease classification. This paper proposes a novel compressed sensing based resting-state fMRI analysis tool called Sparse-SPM. As the brain's functional systems has shown to have features of complex networks according to graph theoretical analysis, we apply a graph model to represent a sparse combination of information flows in complex network perspectives. In particular, a new concept of spatially adaptive design matrix has been proposed by implementing sparse dictionary learning based on sparsity. The proposed approach shows better performance compared to other conventional methods, such as independent component analysis (ICA) and seed-based approach, in classifying the AD patients from normal using resting-state analysis.
Simulation of sparse matrix array designs
NASA Astrophysics Data System (ADS)
Boehm, Rainer; Heckel, Thomas
2018-04-01
Matrix phased array probes are becoming more prominently used in industrial applications. The main drawbacks, using probes incorporating a very large number of transducer elements, are needed for an appropriate cabling and an ultrasonic device offering many parallel channels. Matrix arrays designed for extended functionality feature at least 64 or more elements. Typical arrangements are square matrices, e.g., 8 by 8 or 11 by 11 or rectangular matrixes, e.g., 8 by 16 or 10 by 12 to fit a 128-channel phased array system. In some phased array systems, the number of simultaneous active elements is limited to a certain number, e.g., 32 or 64. Those setups do not allow running the probe with all elements active, which may cause a significant change in the directivity pattern of the resulting sound beam. When only a subset of elements can be used during a single acquisition, different strategies may be applied to collect enough data for rebuilding the missing information from the echo signal. Omission of certain elements may be one approach, overlay of subsequent shots with different active areas may be another one. This paper presents the influence of a decreased number of active elements on the sound field and their distribution on the array. Solutions using subsets with different element activity patterns on matrix arrays and their advantages and disadvantages concerning the sound field are evaluated using semi-analytical simulation tools. Sound field criteria are discussed, which are significant for non-destructive testing results and for the system setup.
Cucheb: A GPU implementation of the filtered Lanczos procedure
NASA Astrophysics Data System (ADS)
Aurentz, Jared L.; Kalantzis, Vassilis; Saad, Yousef
2017-11-01
This paper describes the software package Cucheb, a GPU implementation of the filtered Lanczos procedure for the solution of large sparse symmetric eigenvalue problems. The filtered Lanczos procedure uses a carefully chosen polynomial spectral transformation to accelerate convergence of the Lanczos method when computing eigenvalues within a desired interval. This method has proven particularly effective for eigenvalue problems that arise in electronic structure calculations and density functional theory. We compare our implementation against an equivalent CPU implementation and show that using the GPU can reduce the computation time by more than a factor of 10. Program Summary Program title: Cucheb Program Files doi:http://dx.doi.org/10.17632/rjr9tzchmh.1 Licensing provisions: MIT Programming language: CUDA C/C++ Nature of problem: Electronic structure calculations require the computation of all eigenvalue-eigenvector pairs of a symmetric matrix that lie inside a user-defined real interval. Solution method: To compute all the eigenvalues within a given interval a polynomial spectral transformation is constructed that maps the desired eigenvalues of the original matrix to the exterior of the spectrum of the transformed matrix. The Lanczos method is then used to compute the desired eigenvectors of the transformed matrix, which are then used to recover the desired eigenvalues of the original matrix. The bulk of the operations are executed in parallel using a graphics processing unit (GPU). Runtime: Variable, depending on the number of eigenvalues sought and the size and sparsity of the matrix. Additional comments: Cucheb is compatible with CUDA Toolkit v7.0 or greater.
The CSM testbed matrix processors internal logic and dataflow descriptions
NASA Technical Reports Server (NTRS)
Regelbrugge, Marc E.; Wright, Mary A.
1988-01-01
This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
Energy conserving, linear scaling Born-Oppenheimer molecular dynamics.
Cawkwell, M J; Niklasson, Anders M N
2012-10-07
Born-Oppenheimer molecular dynamics simulations with long-term conservation of the total energy and a computational cost that scales linearly with system size have been obtained simultaneously. Linear scaling with a low pre-factor is achieved using density matrix purification with sparse matrix algebra and a numerical threshold on matrix elements. The extended Lagrangian Born-Oppenheimer molecular dynamics formalism [A. M. N. Niklasson, Phys. Rev. Lett. 100, 123004 (2008)] yields microcanonical trajectories with the approximate forces obtained from the linear scaling method that exhibit no systematic drift over hundreds of picoseconds and which are indistinguishable from trajectories computed using exact forces.
Reconstruction of Complex Network based on the Noise via QR Decomposition and Compressed Sensing.
Li, Lixiang; Xu, Dafei; Peng, Haipeng; Kurths, Jürgen; Yang, Yixian
2017-11-08
It is generally known that the states of network nodes are stable and have strong correlations in a linear network system. We find that without the control input, the method of compressed sensing can not succeed in reconstructing complex networks in which the states of nodes are generated through the linear network system. However, noise can drive the dynamics between nodes to break the stability of the system state. Therefore, a new method integrating QR decomposition and compressed sensing is proposed to solve the reconstruction problem of complex networks under the assistance of the input noise. The state matrix of the system is decomposed by QR decomposition. We construct the measurement matrix with the aid of Gaussian noise so that the sparse input matrix can be reconstructed by compressed sensing. We also discover that noise can build a bridge between the dynamics and the topological structure. Experiments are presented to show that the proposed method is more accurate and more efficient to reconstruct four model networks and six real networks by the comparisons between the proposed method and only compressed sensing. In addition, the proposed method can reconstruct not only the sparse complex networks, but also the dense complex networks.
Joint Smoothed l₀-Norm DOA Estimation Algorithm for Multiple Measurement Vectors in MIMO Radar.
Liu, Jing; Zhou, Weidong; Juwono, Filbert H
2017-05-08
Direction-of-arrival (DOA) estimation is usually confronted with a multiple measurement vector (MMV) case. In this paper, a novel fast sparse DOA estimation algorithm, named the joint smoothed l 0 -norm algorithm, is proposed for multiple measurement vectors in multiple-input multiple-output (MIMO) radar. To eliminate the white or colored Gaussian noises, the new method first obtains a low-complexity high-order cumulants based data matrix. Then, the proposed algorithm designs a joint smoothed function tailored for the MMV case, based on which joint smoothed l 0 -norm sparse representation framework is constructed. Finally, for the MMV-based joint smoothed function, the corresponding gradient-based sparse signal reconstruction is designed, thus the DOA estimation can be achieved. The proposed method is a fast sparse representation algorithm, which can solve the MMV problem and perform well for both white and colored Gaussian noises. The proposed joint algorithm is about two orders of magnitude faster than the l 1 -norm minimization based methods, such as l 1 -SVD (singular value decomposition), RV (real-valued) l 1 -SVD and RV l 1 -SRACV (sparse representation array covariance vectors), and achieves better DOA estimation performance.
Framework to trade optimality for local processing in large-scale wavefront reconstruction problems.
Haber, Aleksandar; Verhaegen, Michel
2016-11-15
We show that the minimum variance wavefront estimation problems permit localized approximate solutions, in the sense that the wavefront value at a point (excluding unobservable modes, such as the piston mode) can be approximated by a linear combination of the wavefront slope measurements in the point's neighborhood. This enables us to efficiently compute a wavefront estimate by performing a single sparse matrix-vector multiplication. Moreover, our results open the possibility for the development of wavefront estimators that can be easily implemented in a decentralized/distributed manner, and in which the estimate optimality can be easily traded for computational efficiency. We numerically validate our approach on Hudgin wavefront sensor geometries, and the results can be easily generalized to Fried geometries.
Ionospheric-thermospheric UV tomography: 1. Image space reconstruction algorithms
NASA Astrophysics Data System (ADS)
Dymond, K. F.; Budzien, S. A.; Hei, M. A.
2017-03-01
We present and discuss two algorithms of the class known as Image Space Reconstruction Algorithms (ISRAs) that we are applying to the solution of large-scale ionospheric tomography problems. ISRAs have several desirable features that make them useful for ionospheric tomography. In addition to producing nonnegative solutions, ISRAs are amenable to sparse-matrix formulations and are fast, stable, and robust. We present the results of our studies of two types of ISRA: the Least Squares Positive Definite and the Richardson-Lucy algorithms. We compare their performance to the Multiplicative Algebraic Reconstruction and Conjugate Gradient Least Squares algorithms. We then discuss the use of regularization in these algorithms and present our new approach based on regularization to a partial differential equation.
Xie, Jianwen; Douglas, Pamela K; Wu, Ying Nian; Brody, Arthur L; Anderson, Ariana E
2017-04-15
Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet other mathematical constraints provide alternate biologically-plausible frameworks for generating brain networks. Non-negative matrix factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks within scan for different constraints are used as basis functions to encode observed functional activity. These encodings are then decoded using machine learning, by using the time series weights to predict within scan whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. The sparse coding algorithm of L1 Regularized Learning outperformed 4 variations of ICA (p<0.001) for predicting the task being performed within each scan using artifact-cleaned components. The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy compared to the ICA and sparse coding algorithms. Holding constant the effect of the extraction algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<0.001). Lower classification accuracy occurred when the extracted spatial maps contained more CSF regions (p<0.001). The success of sparse coding algorithms suggests that algorithms which enforce sparsity, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. Negative BOLD signal may capture task-related activations. Copyright © 2017 Elsevier B.V. All rights reserved.
Laplace-domain waveform modeling and inversion for the 3D acoustic-elastic coupled media
NASA Astrophysics Data System (ADS)
Shin, Jungkyun; Shin, Changsoo; Calandra, Henri
2016-06-01
Laplace-domain waveform inversion reconstructs long-wavelength subsurface models by using the zero-frequency component of damped seismic signals. Despite the computational advantages of Laplace-domain waveform inversion over conventional frequency-domain waveform inversion, an acoustic assumption and an iterative matrix solver have been used to invert 3D marine datasets to mitigate the intensive computing cost. In this study, we develop a Laplace-domain waveform modeling and inversion algorithm for 3D acoustic-elastic coupled media by using a parallel sparse direct solver library (MUltifrontal Massively Parallel Solver, MUMPS). We precisely simulate a real marine environment by coupling the 3D acoustic and elastic wave equations with the proper boundary condition at the fluid-solid interface. In addition, we can extract the elastic properties of the Earth below the sea bottom from the recorded acoustic pressure datasets. As a matrix solver, the parallel sparse direct solver is used to factorize the non-symmetric impedance matrix in a distributed memory architecture and rapidly solve the wave field for a number of shots by using the lower and upper matrix factors. Using both synthetic datasets and real datasets obtained by a 3D wide azimuth survey, the long-wavelength component of the P-wave and S-wave velocity models is reconstructed and the proposed modeling and inversion algorithm are verified. A cluster of 80 CPU cores is used for this study.
Wavelet-like bases for thin-wire integral equations in electromagnetics
NASA Astrophysics Data System (ADS)
Francomano, E.; Tortorici, A.; Toscano, E.; Ala, G.; Viola, F.
2005-03-01
In this paper, wavelets are used in solving, by the method of moments, a modified version of the thin-wire electric field integral equation, in frequency domain. The time domain electromagnetic quantities, are obtained by using the inverse discrete fast Fourier transform. The retarded scalar electric and vector magnetic potentials are employed in order to obtain the integral formulation. The discretized model generated by applying the direct method of moments via point-matching procedure, results in a linear system with a dense matrix which have to be solved for each frequency of the Fourier spectrum of the time domain impressed source. Therefore, orthogonal wavelet-like basis transform is used to sparsify the moment matrix. In particular, dyadic and M-band wavelet transforms have been adopted, so generating different sparse matrix structures. This leads to an efficient solution in solving the resulting sparse matrix equation. Moreover, a wavelet preconditioner is used to accelerate the convergence rate of the iterative solver employed. These numerical features are used in analyzing the transient behavior of a lightning protection system. In particular, the transient performance of the earth termination system of a lightning protection system or of the earth electrode of an electric power substation, during its operation is focused. The numerical results, obtained by running a complex structure, are discussed and the features of the used method are underlined.
(EDMUNDS, WA) WILDLAND FIRE EMISSIONS MODELING: INTEGRATING BLUESKY AND SMOKE
This presentation is a status update of the BlueSky emissions modeling system. BlueSky-EM has been coupled with the Sparse Matrix Operational Kernel Emissions (SMOKE) system, and is now available as a tool for estimating emissions from wildland fires
Automatic Management of Parallel and Distributed System Resources
NASA Technical Reports Server (NTRS)
Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.
1990-01-01
Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Numerical Technology for Large-Scale Computational Electromagnetics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharpe, R; Champagne, N; White, D
The key bottleneck of implicit computational electromagnetics tools for large complex geometries is the solution of the resulting linear system of equations. The goal of this effort was to research and develop critical numerical technology that alleviates this bottleneck for large-scale computational electromagnetics (CEM). The mathematical operators and numerical formulations used in this arena of CEM yield linear equations that are complex valued, unstructured, and indefinite. Also, simultaneously applying multiple mathematical modeling formulations to different portions of a complex problem (hybrid formulations) results in a mixed structure linear system, further increasing the computational difficulty. Typically, these hybrid linear systems aremore » solved using a direct solution method, which was acceptable for Cray-class machines but does not scale adequately for ASCI-class machines. Additionally, LLNL's previously existing linear solvers were not well suited for the linear systems that are created by hybrid implicit CEM codes. Hence, a new approach was required to make effective use of ASCI-class computing platforms and to enable the next generation design capabilities. Multiple approaches were investigated, including the latest sparse-direct methods developed by our ASCI collaborators. In addition, approaches that combine domain decomposition (or matrix partitioning) with general-purpose iterative methods and special purpose pre-conditioners were investigated. Special-purpose pre-conditioners that take advantage of the structure of the matrix were adapted and developed based on intimate knowledge of the matrix properties. Finally, new operator formulations were developed that radically improve the conditioning of the resulting linear systems thus greatly reducing solution time. The goal was to enable the solution of CEM problems that are 10 to 100 times larger than our previous capability.« less
Hwang, Geelsu; Koltisko, Bernard; Jin, Xiaoming; Koo, Hyun
2017-11-08
Surface-grown bacteria and production of an extracellular polymeric matrix modulate the assembly of highly cohesive and firmly attached biofilms, making them difficult to remove from solid surfaces. Inhibition of cell growth and inactivation of matrix-producing bacteria can impair biofilm formation and facilitate removal. Here, we developed a novel nonleachable antibacterial composite with potent antibiofilm activity by directly incorporating polymerizable imidazolium-containing resin (antibacterial resin with carbonate linkage; ABR-C) into a methacrylate-based scaffold (ABR-modified composite; ABR-MC) using an efficient yet simplified chemistry. Low-dose inclusion of imidazolium moiety (∼2 wt %) resulted in bioactivity with minimal cytotoxicity without compromising mechanical integrity of the restorative material. The antibiofilm properties of ABR-MC were assessed using an exopolysaccharide-matrix-producing (EPS-matrix-producing) oral pathogen (Streptococcus mutans) in an experimental biofilm model. Using high-resolution confocal fluorescence imaging and biophysical methods, we observed remarkable disruption of bacterial accumulation and defective 3D matrix structure on the surface of ABR-MC. Specifically, the antibacterial composite impaired the ability of S. mutans to form organized bacterial clusters on the surface, resulting in altered biofilm architecture with sparse cell accumulation and reduced amounts of EPS matrix (versus control composite). Biofilm topology analyses on the control composite revealed a highly organized and weblike EPS structure that tethers the bacterial clusters to each other and to the surface, forming a highly cohesive unit. In contrast, such a structured matrix was absent on the surface of ABR-MC with mostly sparse and amorphous EPS, indicating disruption in the biofilm physical stability. Consistent with lack of structural organization, the defective biofilm on the surface of ABR-MC was readily detached when subjected to low shear stress, while most of the biofilm biomass remained on the control surface. Altogether, we demonstrate a new nonleachable antibacterial composite with excellent antibiofilm activity without affecting its mechanical properties, which may serve as a platform for development of alternative antifouling biomaterials.
Hou, Gary Y; Provost, Jean; Grondin, Julien; Wang, Shutao; Marquet, Fabrice; Bunting, Ethan; Konofagou, Elisa E
2014-11-01
Harmonic motion imaging for focused ultrasound (HMIFU) utilizes an amplitude-modulated HIFU beam to induce a localized focal oscillatory motion simultaneously estimated. The objective of this study is to develop and show the feasibility of a novel fast beamforming algorithm for image reconstruction using GPU-based sparse-matrix operation with real-time feedback. In this study, the algorithm was implemented onto a fully integrated, clinically relevant HMIFU system. A single divergent transmit beam was used while fast beamforming was implemented using a GPU-based delay-and-sum method and a sparse-matrix operation. Axial HMI displacements were then estimated from the RF signals using a 1-D normalized cross-correlation method and streamed to a graphic user interface with frame rates up to 15 Hz, a 100-fold increase compared to conventional CPU-based processing. The real-time feedback rate does not require interrupting the HIFU treatment. Results in phantom experiments showed reproducible HMI images and monitoring of 22 in vitro HIFU treatments using the new 2-D system demonstrated reproducible displacement imaging, and monitoring of 22 in vitro HIFU treatments using the new 2-D system showed a consistent average focal displacement decrease of 46.7 ±14.6% during lesion formation. Complementary focal temperature monitoring also indicated an average rate of displacement increase and decrease with focal temperature at 0.84±1.15%/(°)C, and 2.03±0.93%/(°)C , respectively. These results reinforce the HMIFU capability of estimating and monitoring stiffness related changes in real time. Current ongoing studies include clinical translation of the presented system for monitoring of HIFU treatment for breast and pancreatic tumor applications.
Das, Anup; Sampson, Aaron L.; Lainscsek, Claudia; Muller, Lyle; Lin, Wutu; Doyle, John C.; Cash, Sydney S.; Halgren, Eric; Sejnowski, Terrence J.
2017-01-01
The correlation method from brain imaging has been used to estimate functional connectivity in the human brain. However, brain regions might show very high correlation even when the two regions are not directly connected due to the strong interaction of the two regions with common input from a third region. One previously proposed solution to this problem is to use a sparse regularized inverse covariance matrix or precision matrix (SRPM) assuming that the connectivity structure is sparse. This method yields partial correlations to measure strong direct interactions between pairs of regions while simultaneously removing the influence of the rest of the regions, thus identifying regions that are conditionally independent. To test our methods, we first demonstrated conditions under which the SRPM method could indeed find the true physical connection between a pair of nodes for a spring-mass example and an RC circuit example. The recovery of the connectivity structure using the SRPM method can be explained by energy models using the Boltzmann distribution. We then demonstrated the application of the SRPM method for estimating brain connectivity during stage 2 sleep spindles from human electrocorticography (ECoG) recordings using an 8 × 8 electrode array. The ECoG recordings that we analyzed were from a 32-year-old male patient with long-standing pharmaco-resistant left temporal lobe complex partial epilepsy. Sleep spindles were automatically detected using delay differential analysis and then analyzed with SRPM and the Louvain method for community detection. We found spatially localized brain networks within and between neighboring cortical areas during spindles, in contrast to the case when sleep spindles were not present. PMID:28095202
A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging.
Zhou, Ning; Cheung, William K; Qiu, Guoping; Xue, Xiangyang
2011-07-01
The increasing availability of large quantities of user contributed images with labels has provided opportunities to develop automatic tools to tag images to facilitate image search and retrieval. In this paper, we present a novel hybrid probabilistic model (HPM) which integrates low-level image features and high-level user provided tags to automatically tag images. For images without any tags, HPM predicts new tags based solely on the low-level image features. For images with user provided tags, HPM jointly exploits both the image features and the tags in a unified probabilistic framework to recommend additional tags to label the images. The HPM framework makes use of the tag-image association matrix (TIAM). However, since the number of images is usually very large and user-provided tags are diverse, TIAM is very sparse, thus making it difficult to reliably estimate tag-to-tag co-occurrence probabilities. We developed a collaborative filtering method based on nonnegative matrix factorization (NMF) for tackling this data sparsity issue. Also, an L1 norm kernel method is used to estimate the correlations between image features and semantic concepts. The effectiveness of the proposed approach has been evaluated using three databases containing 5,000 images with 371 tags, 31,695 images with 5,587 tags, and 269,648 images with 5,018 tags, respectively.
Improvements in sparse matrix operations of NASTRAN
NASA Technical Reports Server (NTRS)
Harano, S.
1980-01-01
A "nontransmit" packing routine was added to NASTRAN to allow matrix data to be refered to directly from the input/output buffer. Use of the packing routine permits various routines for matrix handling to perform a direct reference to the input/output buffer if data addresses have once been received. The packing routine offers a buffer by buffer backspace feature for efficient backspacing in sequential access. Unlike a conventional backspacing that needs twice back record for a single read of one record (one column), this feature omits overlapping of READ operation and back record. It eliminates the necessity of writing, in decomposition of a symmetric matrix, of a portion of the matrix to its upper triangular matrix from the last to the first columns of the symmetric matrix, thus saving time for generating the upper triangular matrix. Only a lower triangular matrix must be written onto the secondary storage device, bringing 10 to 30% reduction in use of the disk space of the storage device.
Functional brain networks reconstruction using group sparsity-regularized learning.
Zhao, Qinghua; Li, Will X Y; Jiang, Xi; Lv, Jinglei; Lu, Jianfeng; Liu, Tianming
2018-06-01
Investigating functional brain networks and patterns using sparse representation of fMRI data has received significant interests in the neuroimaging community. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. To date, most of data-driven network reconstruction approaches rarely take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks utilizing the structure guided group sparse regression (S2GSR) in which 116 anatomical regions from the AAL template, as prior knowledge, are employed to guide the network reconstruction when performing sparse representation of whole-brain fMRI data. Specifically, we extract fMRI signals from standard space aligned with the AAL template. Then by learning a global over-complete dictionary, with the learned dictionary as a set of features (regressors), the group structured regression employs anatomical structures as group information to regress whole brain signals. Finally, the decomposition coefficients matrix is mapped back to the brain volume to represent functional brain networks and patterns. We use the publicly available Human Connectome Project (HCP) Q1 dataset as the test bed, and the experimental results indicate that the proposed anatomically guided structure sparse representation is effective in reconstructing concurrent functional brain networks.
GPU-accelerated Modeling and Element-free Reverse-time Migration with Gauss Points Partition
NASA Astrophysics Data System (ADS)
Zhen, Z.; Jia, X.
2014-12-01
Element-free method (EFM) has been applied to seismic modeling and migration. Compared with finite element method (FEM) and finite difference method (FDM), it is much cheaper and more flexible because only the information of the nodes and the boundary of the study area are required in computation. In the EFM, the number of Gauss points should be consistent with the number of model nodes; otherwise the accuracy of the intermediate coefficient matrices would be harmed. Thus when we increase the nodes of velocity model in order to obtain higher resolution, we find that the size of the computer's memory will be a bottleneck. The original EFM can deal with at most 81×81 nodes in the case of 2G memory, as tested by Jia and Hu (2006). In order to solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition (GPP), and utilize the GPUs to improve the computation efficiency. Considering the characteristics of the Gaussian points, the GPP method doesn't influence the propagation of seismic wave in the velocity model. To overcome the time-consuming computation of the stiffness matrix (K) and the mass matrix (M), we also use the GPUs in our computation program. We employ the compressed sparse row (CSR) format to compress the intermediate sparse matrices and try to simplify the operations by solving the linear equations with the CULA Sparse's Conjugate Gradient (CG) solver instead of the linear sparse solver 'PARDISO'. It is observed that our strategy can significantly reduce the computational time of K and Mcompared with the algorithm based on CPU. The model tested is Marmousi model. The length of the model is 7425m and the depth is 2990m. We discretize the model with 595x298 nodes, 300x300 Gauss cells and 3x3 Gauss points in each cell. In contrast to the computational time of the conventional EFM, the GPUs-GPP approach can substantially improve the efficiency. The speedup ratio of time consumption of computing K, M is 120 and the speedup ratio time consumption of RTM is 11.5. At the same time, the accuracy of imaging is not harmed. Another advantage of the GPUs-GPP method is its easy applications in other numerical methods such as the FEM. Finally, in the GPUs-GPP method, the arrays require quite limited memory storage, which makes the method promising in dealing with large-scale 3D problems.
Sparse deconvolution for the large-scale ill-posed inverse problem of impact force reconstruction
NASA Astrophysics Data System (ADS)
Qiao, Baijie; Zhang, Xingwu; Gao, Jiawei; Liu, Ruonan; Chen, Xuefeng
2017-01-01
Most previous regularization methods for solving the inverse problem of force reconstruction are to minimize the l2-norm of the desired force. However, these traditional regularization methods such as Tikhonov regularization and truncated singular value decomposition, commonly fail to solve the large-scale ill-posed inverse problem in moderate computational cost. In this paper, taking into account the sparse characteristic of impact force, the idea of sparse deconvolution is first introduced to the field of impact force reconstruction and a general sparse deconvolution model of impact force is constructed. Second, a novel impact force reconstruction method based on the primal-dual interior point method (PDIPM) is proposed to solve such a large-scale sparse deconvolution model, where minimizing the l2-norm is replaced by minimizing the l1-norm. Meanwhile, the preconditioned conjugate gradient algorithm is used to compute the search direction of PDIPM with high computational efficiency. Finally, two experiments including the small-scale or medium-scale single impact force reconstruction and the relatively large-scale consecutive impact force reconstruction are conducted on a composite wind turbine blade and a shell structure to illustrate the advantage of PDIPM. Compared with Tikhonov regularization, PDIPM is more efficient, accurate and robust whether in the single impact force reconstruction or in the consecutive impact force reconstruction.
COMPUTATION OF GLOBAL PHOTOCHEMISTRY WITH SMVGEAR II (R823186)
A computer model was developed to simulate global gas-phase photochemistry. The model solves chemical equations with SMVGEAR II, a sparse-matrix, vectorized Gear-type code. To obtain SMVGEAR II, the original SMVGEAR code was modified to allow computation of different sets of chem...
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2007-01-01
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Holographic implementation of a binary associative memory for improved recognition
NASA Astrophysics Data System (ADS)
Bandyopadhyay, Somnath; Ghosh, Ajay; Datta, Asit K.
1998-03-01
Neural network associate memory has found wide application sin pattern recognition techniques. We propose an associative memory model for binary character recognition. The interconnection strengths of the memory are binary valued. The concept of sparse coding is sued to enhance the storage efficiency of the model. The question of imposed preconditioning of pattern vectors, which is inherent in a sparsely coded conventional memory, is eliminated by using a multistep correlation technique an the ability of correct association is enhanced in a real-time application. A potential optoelectronic implementation of the proposed associative memory is also described. The learning and recall is possible by using digital optical matrix-vector multiplication, where full use of parallelism and connectivity of optics is made. A hologram is used in the experiment as a longer memory (LTM) for storing all input information. The short-term memory or the interconnection weight matrix required during the recall process is configured by retrieving the necessary information from the holographic LTM.
A method of vehicle license plate recognition based on PCANet and compressive sensing
NASA Astrophysics Data System (ADS)
Ye, Xianyi; Min, Feng
2018-03-01
The manual feature extraction of the traditional method for vehicle license plates has no good robustness to change in diversity. And the high feature dimension that is extracted with Principal Component Analysis Network (PCANet) leads to low classification efficiency. For solving these problems, a method of vehicle license plate recognition based on PCANet and compressive sensing is proposed. First, PCANet is used to extract the feature from the images of characters. And then, the sparse measurement matrix which is a very sparse matrix and consistent with Restricted Isometry Property (RIP) condition of the compressed sensing is used to reduce the dimensions of extracted features. Finally, the Support Vector Machine (SVM) is used to train and recognize the features whose dimension has been reduced. Experimental results demonstrate that the proposed method has better performance than Convolutional Neural Network (CNN) in the recognition and time. Compared with no compression sensing, the proposed method has lower feature dimension for the increase of efficiency.
Horak, Jakub
2014-06-01
The conservation of traditional fruit orchards might be considered to be a fashion, and many people might find it difficult to accept that these artificial habitats can be significant for overall biodiversity. The main aim of this study was to identify possible roles of traditional fruit orchards for dead wood-dependent (saproxylic) beetles. The study was performed in the Central European landscape in the Czech Republic, which was historically covered by lowland sparse deciduous woodlands. Window traps were used to catch saproxylic beetles in 25 traditional fruit orchards. The species richness, as one of the best indicators of biodiversity, was positively driven by very high canopy openness and the rising proportion of deciduous woodlands in the matrix of the surrounding landscape. Due to the disappearance of natural and semi-natural habitats (i.e., sparse deciduous woodlands) of saproxylic beetles, orchards might complement the functions of suitable habitat fragments as the last biotic islands in the matrix of the cultural Central European landscape.
Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection.
Wang, Haoran; Yuan, Chunfeng; Hu, Weiming; Ling, Haibin; Yang, Wankou; Sun, Changyin
2014-02-01
In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors. Second, from the statistics of the context-aware descriptors, we learn action units using the graph regularized nonnegative matrix factorization, which leads to a part-based representation and encodes the geometrical information. These units effectively bridge the semantic gap in action recognition. Third, we propose a sparse model based on a joint l2,1-norm to preserve the representative items and suppress noise in the action units. Intuitively, when learning the dictionary for action representation, the sparse model captures the fact that actions from the same class share similar units. The proposed approach is evaluated on several publicly available data sets. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.
Ren, Yudan; Fang, Jun; Lv, Jinglei; Hu, Xintao; Guo, Cong Christine; Guo, Lei; Xu, Jiansong; Potenza, Marc N; Liu, Tianming
2017-08-01
Assessing functional brain activation patterns in neuropsychiatric disorders such as cocaine dependence (CD) or pathological gambling (PG) under naturalistic stimuli has received rising interest in recent years. In this paper, we propose and apply a novel group-wise sparse representation framework to assess differences in neural responses to naturalistic stimuli across multiple groups of participants (healthy control, cocaine dependence, pathological gambling). Specifically, natural stimulus fMRI (N-fMRI) signals from all three groups of subjects are aggregated into a big data matrix, which is then decomposed into a common signal basis dictionary and associated weight coefficient matrices via an effective online dictionary learning and sparse coding method. The coefficient matrices associated with each common dictionary atom are statistically assessed for each group separately. With the inter-group comparisons based on the group-wise correspondence established by the common dictionary, our experimental results demonstrated that the group-wise sparse coding and representation strategy can effectively and specifically detect brain networks/regions affected by different pathological conditions of the brain under naturalistic stimuli.
Jelsch, C
2001-09-01
The normal matrix in the least-squares refinement of macromolecules is very sparse when the resolution reaches atomic and subatomic levels. The elements of the normal matrix, related to coordinates, thermal motion and charge-density parameters, have a global tendency to decrease rapidly with the interatomic distance between the atoms concerned. For instance, in the case of the protein crambin at 0.54 A resolution, the elements are reduced by two orders of magnitude for distances above 1.5 A. The neglect a priori of most of the normal-matrix elements according to a distance criterion represents an approximation in the refinement of macromolecules, which is particularly valid at very high resolution. The analytical expressions of the normal-matrix elements, which have been derived for the coordinates and the thermal parameters, show that the degree of matrix sparsity increases with the diffraction resolution and the size of the asymmetric unit.
Structure-preserving and rank-revealing QR-factorizations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bischof, C.H.; Hansen, P.C.
1991-11-01
The rank-revealing QR-factorization (RRQR-factorization) is a special QR-factorization that is guaranteed to reveal the numerical rank of the matrix under consideration. This makes the RRQR-factorization a useful tool in the numerical treatment of many rank-deficient problems in numerical linear algebra. In this paper, a framework is presented for the efficient implementation of RRQR algorithms, in particular, for sparse matrices. A sparse RRQR-algorithm should seek to preserve the structure and sparsity of the matrix as much as possible while retaining the ability to capture safely the numerical rank. To this end, the paper proposes to compute an initial QR-factorization using amore » restricted pivoting strategy guarded by incremental condition estimation (ICE), and then applies the algorithm suggested by Chan and Foster to this QR-factorization. The column exchange strategy used in the initial QR factorization will exploit the fact that certain column exchanges do not change the sparsity structure, and compute a sparse QR-factorization that is a good approximation of the sought-after RRQR-factorization. Due to quantities produced by ICE, the Chan/Foster RRQR algorithm can be implemented very cheaply, thus verifying that the sought-after RRQR-factorization has indeed been computed. Experimental results on a model problem show that the initial QR-factorization is indeed very likely to produce RRQR-factorization.« less
Wavelets in electronic structure calculations
NASA Astrophysics Data System (ADS)
Modisette, Jason Perry
1997-09-01
Ab initio calculations of the electronic structure of bulk materials and large clusters are not possible on today's computers using current techniques. The storage and diagonalization of the Hamiltonian matrix are the limiting factors in both memory and execution time. The scaling of both quantities with problem size can be reduced by using approximate diagonalization or direct minimization of the total energy with respect to the density matrix in conjunction with a localized basis. Wavelet basis members are much more localized than conventional bases such as Gaussians or numerical atomic orbitals. This localization leads to sparse matrices of the operators that arise in SCF multi-electron calculations. We have investigated the construction of the one-electron Hamiltonian, and also the effective one- electron Hamiltonians that appear in density-functional and Hartree-Fock theories. We develop efficient methods for the generation of the kinetic energy and potential matrices, the Hartree and exchange potentials, and the local exchange-correlation potential of the LDA. Test calculations are performed on one-electron problems with a variety of potentials in one and three dimensions.
Solution of the three-dimensional Helmholtz equation with nonlocal boundary conditions
NASA Technical Reports Server (NTRS)
Hodge, Steve L.; Zorumski, William E.; Watson, Willie R.
1995-01-01
The Helmholtz equation is solved within a three-dimensional rectangular duct with a nonlocal radiation boundary condition at the duct exit plane. This condition accurately models the acoustic admittance at an arbitrarily-located computational boundary plane. A linear system of equations is constructed with second-order central differences for the Helmholtz operator and second-order backward differences for both local admittance conditions and the gradient term in the nonlocal radiation boundary condition. The resulting matrix equation is large, sparse, and non-Hermitian. The size and structure of the matrix makes direct solution techniques impractical; as a result, a nonstationary iterative technique is used for its solution. The theory behind the nonstationary technique is reviewed, and numerical results are presented for radiation from both a point source and a planar acoustic source. The solutions with the nonlocal boundary conditions are invariant to the location of the computational boundary, and the same nonlocal conditions are valid for all solutions. The nonlocal conditions thus provide a means of minimizing the size of three-dimensional computational domains.
Color Sparse Representations for Image Processing: Review, Models, and Prospects.
Barthélemy, Quentin; Larue, Anthony; Mars, Jérôme I
2015-11-01
Sparse representations have been extended to deal with color images composed of three channels. A review of dictionary-learning-based sparse representations for color images is made here, detailing the differences between the models, and comparing their results on the real and simulated data. These models are considered in a unifying framework that is based on the degrees of freedom of the linear filtering/transformation of the color channels. Moreover, this allows it to be shown that the scalar quaternionic linear model is equivalent to constrained matrix-based color filtering, which highlights the filtering implicitly applied through this model. Based on this reformulation, the new color filtering model is introduced, using unconstrained filters. In this model, spatial morphologies of color images are encoded by atoms, and colors are encoded by color filters. Color variability is no longer captured in increasing the dictionary size, but with color filters, this gives an efficient color representation.
A sparse equivalent source method for near-field acoustic holography.
Fernandez-Grande, Efren; Xenaki, Angeliki; Gerstoft, Peter
2017-01-01
This study examines a near-field acoustic holography method consisting of a sparse formulation of the equivalent source method, based on the compressive sensing (CS) framework. The method, denoted Compressive-Equivalent Source Method (C-ESM), encourages spatially sparse solutions (based on the superposition of few waves) that are accurate when the acoustic sources are spatially localized. The importance of obtaining a non-redundant representation, i.e., a sensing matrix with low column coherence, and the inherent ill-conditioning of near-field reconstruction problems is addressed. Numerical and experimental results on a classical guitar and on a highly reactive dipole-like source are presented. C-ESM is valid beyond the conventional sampling limits, making wide-band reconstruction possible. Spatially extended sources can also be addressed with C-ESM, although in this case the obtained solution does not recover the spatial extent of the source.
Hou, Gary Y.; Provost, Jean; Grondin, Julien; Wang, Shutao; Marquet, Fabrice; Bunting, Ethan; Konofagou, Elisa E.
2015-01-01
Harmonic Motion Imaging for Focused Ultrasound (HMIFU) is a recently developed High-Intensity Focused Ultrasound (HIFU) treatment monitoring method. HMIFU utilizes an Amplitude-Modulated (fAM = 25 Hz) HIFU beam to induce a localized focal oscillatory motion, which is simultaneously estimated and imaged by confocally-aligned imaging transducer. HMIFU feasibilities have been previously shown in silico, in vitro, and in vivo in 1-D or 2-D monitoring of HIFU treatment. The objective of this study is to develop and show the feasibility of a novel fast beamforming algorithm for image reconstruction using GPU-based sparse-matrix operation with real-time feedback. In this study, the algorithm was implemented onto a fully integrated, clinically relevant HMIFU system composed of a 93-element HIFU transducer (fcenter = 4.5MHz) and coaxially-aligned 64-element phased array (fcenter = 2.5MHz) for displacement excitation and motion estimation, respectively. A single transmit beam with divergent beam transmit was used while fast beamforming was implemented using a GPU-based delay-and-sum method and a sparse-matrix operation. Axial HMI displacements were then estimated from the RF signals using a 1-D normalized cross-correlation method and streamed to a graphic user interface. The present work developed and implemented a sparse matrix beamforming onto a fully-integrated, clinically relevant system, which can stream displacement images up to 15 Hz using a GPU-based processing, an increase of 100 fold in rate of streaming displacement images compared to conventional CPU-based conventional beamforming and reconstruction processing. The achieved feedback rate is also currently the fastest and only approach that does not require interrupting the HIFU treatment amongst the acoustic radiation force based HIFU imaging techniques. Results in phantom experiments showed reproducible displacement imaging, and monitoring of twenty two in vitro HIFU treatments using the new 2D system showed a consistent average focal displacement decrease of 46.7±14.6% during lesion formation. Complementary focal temperature monitoring also indicated an average rate of displacement increase and decrease with focal temperature at 0.84±1.15 %/ °C, and 2.03± 0.93%/ °C, respectively. These results reinforce the HMIFU capability of estimating and monitoring stiffness related changes in real time. Current ongoing studies include clinical translation of the presented system for monitoring of HIFU treatment for breast and pancreatic tumor applications. PMID:24960528
On the sparseness of 1-norm support vector machines.
Zhang, Li; Zhou, Weida
2010-04-01
There is some empirical evidence available showing that 1-norm Support Vector Machines (1-norm SVMs) have good sparseness; however, both how good sparseness 1-norm SVMs can reach and whether they have a sparser representation than that of standard SVMs are not clear. In this paper we take into account the sparseness of 1-norm SVMs. Two upper bounds on the number of nonzero coefficients in the decision function of 1-norm SVMs are presented. First, the number of nonzero coefficients in 1-norm SVMs is at most equal to the number of only the exact support vectors lying on the +1 and -1 discriminating surfaces, while that in standard SVMs is equal to the number of support vectors, which implies that 1-norm SVMs have better sparseness than that of standard SVMs. Second, the number of nonzero coefficients is at most equal to the rank of the sample matrix. A brief review of the geometry of linear programming and the primal steepest edge pricing simplex method are given, which allows us to provide the proof of the two upper bounds and evaluate their tightness by experiments. Experimental results on toy data sets and the UCI data sets illustrate our analysis. Copyright 2009 Elsevier Ltd. All rights reserved.
Improving M-SBL for Joint Sparse Recovery Using a Subspace Penalty
NASA Astrophysics Data System (ADS)
Ye, Jong Chul; Kim, Jong Min; Bresler, Yoram
2015-12-01
The multiple measurement vector problem (MMV) is a generalization of the compressed sensing problem that addresses the recovery of a set of jointly sparse signal vectors. One of the important contributions of this paper is to reveal that the seemingly least related state-of-art MMV joint sparse recovery algorithms - M-SBL (multiple sparse Bayesian learning) and subspace-based hybrid greedy algorithms - have a very important link. More specifically, we show that replacing the $\\log\\det(\\cdot)$ term in M-SBL by a rank proxy that exploits the spark reduction property discovered in subspace-based joint sparse recovery algorithms, provides significant improvements. In particular, if we use the Schatten-$p$ quasi-norm as the corresponding rank proxy, the global minimiser of the proposed algorithm becomes identical to the true solution as $p \\rightarrow 0$. Furthermore, under the same regularity conditions, we show that the convergence to a local minimiser is guaranteed using an alternating minimization algorithm that has closed form expressions for each of the minimization steps, which are convex. Numerical simulations under a variety of scenarios in terms of SNR, and condition number of the signal amplitude matrix demonstrate that the proposed algorithm consistently outperforms M-SBL and other state-of-the art algorithms.
Topological and kinetic determinants of the modal matrices of dynamic models of metabolism
2017-01-01
Large-scale kinetic models of metabolism are becoming increasingly comprehensive and accurate. A key challenge is to understand the biochemical basis of the dynamic properties of these models. Linear analysis methods are well-established as useful tools for characterizing the dynamic response of metabolic networks. Central to linear analysis methods are two key matrices: the Jacobian matrix (J) and the modal matrix (M-1) arising from its eigendecomposition. The modal matrix M-1 contains dynamically independent motions of the kinetic model near a reference state, and it is sparse in practice for metabolic networks. However, connecting the structure of M-1 to the kinetic properties of the underlying reactions is non-trivial. In this study, we analyze the relationship between J, M-1, and the kinetic properties of the underlying network for kinetic models of metabolism. Specifically, we describe the origin of mode sparsity structure based on features of the network stoichiometric matrix S and the reaction kinetic gradient matrix G. First, we show that due to the scaling of kinetic parameters in real networks, diagonal dominance occurs in a substantial fraction of the rows of J, resulting in simple modal structures with clear biological interpretations. Then, we show that more complicated modes originate from topologically-connected reactions that have similar reaction elasticities in G. These elasticities represent dynamic equilibrium balances within reactions and are key determinants of modal structure. The work presented should prove useful towards obtaining an understanding of the dynamics of kinetic models of metabolism, which are rooted in the network structure and the kinetic properties of reactions. PMID:29267329
Dynamic graph system for a semantic database
Mizell, David
2016-04-12
A method and system in a computer system for dynamically providing a graphical representation of a data store of entries via a matrix interface is disclosed. A dynamic graph system provides a matrix interface that exposes to an application program a graphical representation of data stored in a data store such as a semantic database storing triples. To the application program, the matrix interface represents the graph as a sparse adjacency matrix that is stored in compressed form. Each entry of the data store is considered to represent a link between nodes of the graph. Each entry has a first field and a second field identifying the nodes connected by the link and a third field with a value for the link that connects the identified nodes. The first, second, and third fields represent the rows, column, and elements of the adjacency matrix.
Dynamic graph system for a semantic database
Mizell, David
2015-01-27
A method and system in a computer system for dynamically providing a graphical representation of a data store of entries via a matrix interface is disclosed. A dynamic graph system provides a matrix interface that exposes to an application program a graphical representation of data stored in a data store such as a semantic database storing triples. To the application program, the matrix interface represents the graph as a sparse adjacency matrix that is stored in compressed form. Each entry of the data store is considered to represent a link between nodes of the graph. Each entry has a first field and a second field identifying the nodes connected by the link and a third field with a value for the link that connects the identified nodes. The first, second, and third fields represent the rows, column, and elements of the adjacency matrix.
Jung, Yousung; Shao, Yihan; Head-Gordon, Martin
2007-09-01
The scaled opposite spin Møller-Plesset method (SOS-MP2) is an economical way of obtaining correlation energies that are computationally cheaper, and yet, in a statistical sense, of higher quality than standard MP2 theory, by introducing one empirical parameter. But SOS-MP2 still has a fourth-order scaling step that makes the method inapplicable to very large molecular systems. We reduce the scaling of SOS-MP2 by exploiting the sparsity of expansion coefficients and local integral matrices, by performing local auxiliary basis expansions for the occupied-virtual product distributions. To exploit sparsity of 3-index local quantities, we use a blocking scheme in which entire zero-rows and columns, for a given third global index, are deleted by comparison against a numerical threshold. This approach minimizes sparse matrix book-keeping overhead, and also provides sufficiently large submatrices after blocking, to allow efficient matrix-matrix multiplies. The resulting algorithm is formally cubic scaling, and requires only moderate computational resources (quadratic memory and disk space) and, in favorable cases, is shown to yield effective quadratic scaling behavior in the size regime we can apply it to. Errors associated with local fitting using the attenuated Coulomb metric and numerical thresholds in the blocking procedure are found to be insignificant in terms of the predicted relative energies. A diverse set of test calculations shows that the size of system where significant computational savings can be achieved depends strongly on the dimensionality of the system, and the extent of localizability of the molecular orbitals. Copyright 2007 Wiley Periodicals, Inc.
LiDAR point classification based on sparse representation
NASA Astrophysics Data System (ADS)
Li, Nan; Pfeifer, Norbert; Liu, Chun
2017-04-01
In order to combine the initial spatial structure and features of LiDAR data for accurate classification. The LiDAR data is represented as a 4-order tensor. Sparse representation for classification(SRC) method is used for LiDAR tensor classification. It turns out SRC need only a few of training samples from each class, meanwhile can achieve good classification result. Multiple features are extracted from raw LiDAR points to generate a high-dimensional vector at each point. Then the LiDAR tensor is built by the spatial distribution and feature vectors of the point neighborhood. The entries of LiDAR tensor are accessed via four indexes. Each index is called mode: three spatial modes in direction X ,Y ,Z and one feature mode. Sparse representation for classification(SRC) method is proposed in this paper. The sparsity algorithm is to find the best represent the test sample by sparse linear combination of training samples from a dictionary. To explore the sparsity of LiDAR tensor, the tucker decomposition is used. It decomposes a tensor into a core tensor multiplied by a matrix along each mode. Those matrices could be considered as the principal components in each mode. The entries of core tensor show the level of interaction between the different components. Therefore, the LiDAR tensor can be approximately represented by a sparse tensor multiplied by a matrix selected from a dictionary along each mode. The matrices decomposed from training samples are arranged as initial elements in the dictionary. By dictionary learning, a reconstructive and discriminative structure dictionary along each mode is built. The overall structure dictionary composes of class-specified sub-dictionaries. Then the sparse core tensor is calculated by tensor OMP(Orthogonal Matching Pursuit) method based on dictionaries along each mode. It is expected that original tensor should be well recovered by sub-dictionary associated with relevant class, while entries in the sparse tensor associated with other classed should be nearly zero. Therefore, SRC use the reconstruction error associated with each class to do data classification. A section of airborne LiDAR points of Vienna city is used and classified into 6classes: ground, roofs, vegetation, covered ground, walls and other points. Only 6 training samples from each class are taken. For the final classification result, ground and covered ground are merged into one same class(ground). The classification accuracy for ground is 94.60%, roof is 95.47%, vegetation is 85.55%, wall is 76.17%, other object is 20.39%.
Shokrollahi, Mehrnaz; Krishnan, Sridhar; Dopsa, Dustin D; Muir, Ryan T; Black, Sandra E; Swartz, Richard H; Murray, Brian J; Boulos, Mark I
2016-11-01
Stroke is a leading cause of death and disability in adults, and incurs a significant economic burden to society. Periodic limb movements (PLMs) in sleep are repetitive movements involving the great toe, ankle, and hip. Evolving evidence suggests that PLMs may be associated with high blood pressure and stroke, but this relationship remains underexplored. Several issues limit the study of PLMs including the need to manually score them, which is time-consuming and costly. For this reason, we developed a novel automated method for nocturnal PLM detection, which was shown to be correlated with (a) the manually scored PLM index on polysomnography, and (b) white matter hyperintensities on brain imaging, which have been demonstrated to be associated with PLMs. Our proposed algorithm consists of three main stages: (1) representing the signal in the time-frequency plane using time-frequency matrices (TFM), (2) applying K-nonnegative matrix factorization technique to decompose the TFM matrix into its significant components, and (3) applying kernel sparse representation for classification (KSRC) to the decomposed signal. Our approach was applied to a dataset that consisted of 65 subjects who underwent polysomnography. An overall classification of 97 % was achieved for discrimination of the aforementioned signals, demonstrating the potential of the presented method.
Time integration algorithms for the two-dimensional Euler equations on unstructured meshes
NASA Technical Reports Server (NTRS)
Slack, David C.; Whitaker, D. L.; Walters, Robert W.
1994-01-01
Explicit and implicit time integration algorithms for the two-dimensional Euler equations on unstructured grids are presented. Both cell-centered and cell-vertex finite volume upwind schemes utilizing Roe's approximate Riemann solver are developed. For the cell-vertex scheme, a four-stage Runge-Kutta time integration, a fourstage Runge-Kutta time integration with implicit residual averaging, a point Jacobi method, a symmetric point Gauss-Seidel method and two methods utilizing preconditioned sparse matrix solvers are presented. For the cell-centered scheme, a Runge-Kutta scheme, an implicit tridiagonal relaxation scheme modeled after line Gauss-Seidel, a fully implicit lower-upper (LU) decomposition, and a hybrid scheme utilizing both Runge-Kutta and LU methods are presented. A reverse Cuthill-McKee renumbering scheme is employed for the direct solver to decrease CPU time by reducing the fill of the Jacobian matrix. A comparison of the various time integration schemes is made for both first-order and higher order accurate solutions using several mesh sizes, higher order accuracy is achieved by using multidimensional monotone linear reconstruction procedures. The results obtained for a transonic flow over a circular arc suggest that the preconditioned sparse matrix solvers perform better than the other methods as the number of elements in the mesh increases.
A Fast MoM Solver (GIFFT) for Large Arrays of Microstrip and Cavity-Backed Antennas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fasenfest, B J; Capolino, F; Wilton, D
2005-02-02
A straightforward numerical analysis of large arrays of arbitrary contour (and possibly missing elements) requires large memory storage and long computation times. Several techniques are currently under development to reduce this cost. One such technique is the GIFFT (Green's function interpolation and FFT) method discussed here that belongs to the class of fast solvers for large structures. This method uses a modification of the standard AIM approach [1] that takes into account the reusability properties of matrices that arise from identical array elements. If the array consists of planar conducting bodies, the array elements are meshed using standard subdomain basismore » functions, such as the RWG basis. The Green's function is then projected onto a sparse regular grid of separable interpolating polynomials. This grid can then be used in a 2D or 3D FFT to accelerate the matrix-vector product used in an iterative solver [2]. The method has been proven to greatly reduce solve time by speeding up the matrix-vector product computation. The GIFFT approach also reduces fill time and memory requirements, since only the near element interactions need to be calculated exactly. The present work extends GIFFT to layered material Green's functions and multiregion interactions via slots in ground planes. In addition, a preconditioner is implemented to greatly reduce the number of iterations required for a solution. The general scheme of the GIFFT method is reported in [2]; this contribution is limited to presenting new results for array antennas made of slot-excited patches and cavity-backed patch antennas.« less
Pitchers, W. R.; Brooks, R.; Jennions, M. D.; Tregenza, T.; Dworkin, I.; Hunt, J.
2013-01-01
Phenotypic integration and plasticity are central to our understanding of how complex phenotypic traits evolve. Evolutionary change in complex quantitative traits can be predicted using the multivariate breeders’ equation, but such predictions are only accurate if the matrices involved are stable over evolutionary time. Recent work, however, suggests that these matrices are temporally plastic, spatially variable and themselves evolvable. The data available on phenotypic variance-covariance matrix (P) stability is sparse, and largely focused on morphological traits. Here we compared P for the structure of the complex sexual advertisement call of six divergent allopatric populations of the Australian black field cricket, Teleogryllus commodus. We measured a subset of calls from wild-caught crickets from each of the populations and then a second subset after rearing crickets under common-garden conditions for three generations. In a second experiment, crickets from each population were reared in the laboratory on high- and low-nutrient diets and their calls recorded. In both experiments, we estimated P for call traits and used multiple methods to compare them statistically (Flury hierarchy, geometric subspace comparisons and random skewers). Despite considerable variation in means and variances of individual call traits, the structure of P was largely conserved among populations, across generations and between our rearing diets. Our finding that P remains largely stable, among populations and between environmental conditions, suggests that selection has preserved the structure of call traits in order that they can function as an integrated unit. PMID:23530814
SMOKE TOOL FOR MODELS-3 VERSION 4.1 STRUCTURE AND OPERATION DOCUMENTATION
The SMOKE Tool is a part of the Models-3 system, a flexible software system designed to simplify the development and use of air quality models and other environmental decision support tools. The SMOKE Tool is an input processor for SMOKE, (Sparse Matrix Operator Kernel Emissio...
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement
Hao, Yansong; Song, Liuyang; Tang, Gang; Yuan, Hongfang
2018-01-01
Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency. PMID:29597280
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement.
Ren, Bangyue; Hao, Yansong; Wang, Huaqing; Song, Liuyang; Tang, Gang; Yuan, Hongfang
2018-03-28
Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency.
A Partitioning Algorithm for Block-Diagonal Matrices With Overlap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guy Antoine Atenekeng Kahou; Laura Grigori; Masha Sosonkina
2008-02-02
We present a graph partitioning algorithm that aims at partitioning a sparse matrix into a block-diagonal form, such that any two consecutive blocks overlap. We denote this form of the matrix as the overlapped block-diagonal matrix. The partitioned matrix is suitable for applying the explicit formulation of Multiplicative Schwarz preconditioner (EFMS) described in [3]. The graph partitioning algorithm partitions the graph of the input matrix into K partitions, such that every partition {Omega}{sub i} has at most two neighbors {Omega}{sub i-1} and {Omega}{sub i+1}. First, an ordering algorithm, such as the reverse Cuthill-McKee algorithm, that reduces the matrix profile ismore » performed. An initial overlapped block-diagonal partition is obtained from the profile of the matrix. An iterative strategy is then used to further refine the partitioning by allowing nodes to be transferred between neighboring partitions. Experiments are performed on matrices arising from real-world applications to show the feasibility and usefulness of this approach.« less
Fast sparsely synchronized brain rhythms in a scale-free neural network
NASA Astrophysics Data System (ADS)
Kim, Sang-Yoon; Lim, Woochang
2015-08-01
We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D . For small D , full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp>
ML 3.0 smoothed aggregation user's guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen
2004-05-01
ML is a multigrid preconditioning package intended to solve linear systems of equations Az = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the AZTEC 2.1 and AZTECOO iterative package [15]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and non-symmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
ML 3.1 smoothed aggregation user's guide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sala, Marzio; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen
2004-10-01
ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package ormore » to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative package [16]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell's equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.« less
Self-Taught Learning Based on Sparse Autoencoder for E-Nose in Wound Infection Detection
He, Peilin; Jia, Pengfei; Qiao, Siqi; Duan, Shukai
2017-01-01
For an electronic nose (E-nose) in wound infection distinguishing, traditional learning methods have always needed large quantities of labeled wound infection samples, which are both limited and expensive; thus, we introduce self-taught learning combined with sparse autoencoder and radial basis function (RBF) into the field. Self-taught learning is a kind of transfer learning that can transfer knowledge from other fields to target fields, can solve such problems that labeled data (target fields) and unlabeled data (other fields) do not share the same class labels, even if they are from entirely different distribution. In our paper, we obtain numerous cheap unlabeled pollutant gas samples (benzene, formaldehyde, acetone and ethylalcohol); however, labeled wound infection samples are hard to gain. Thus, we pose self-taught learning to utilize these gas samples, obtaining a basis vector θ. Then, using the basis vector θ, we reconstruct the new representation of wound infection samples under sparsity constraint, which is the input of classifiers. We compare RBF with partial least squares discriminant analysis (PLSDA), and reach a conclusion that the performance of RBF is superior to others. We also change the dimension of our data set and the quantity of unlabeled data to search the input matrix that produces the highest accuracy. PMID:28991154
A path-oriented matrix-based knowledge representation system
NASA Technical Reports Server (NTRS)
Feyock, Stefan; Karamouzis, Stamos T.
1993-01-01
Experience has shown that designing a good representation is often the key to turning hard problems into simple ones. Most AI (Artificial Intelligence) search/representation techniques are oriented toward an infinite domain of objects and arbitrary relations among them. In reality much of what needs to be represented in AI can be expressed using a finite domain and unary or binary predicates. Well-known vector- and matrix-based representations can efficiently represent finite domains and unary/binary predicates, and allow effective extraction of path information by generalized transitive closure/path matrix computations. In order to avoid space limitations a set of abstract sparse matrix data types was developed along with a set of operations on them. This representation forms the basis of an intelligent information system for representing and manipulating relational data.
NASA Astrophysics Data System (ADS)
Gao, Pengzhi; Wang, Meng; Chow, Joe H.; Ghiocel, Scott G.; Fardanesh, Bruce; Stefopoulos, George; Razanousky, Michael P.
2016-11-01
This paper presents a new framework of identifying a series of cyber data attacks on power system synchrophasor measurements. We focus on detecting "unobservable" cyber data attacks that cannot be detected by any existing method that purely relies on measurements received at one time instant. Leveraging the approximate low-rank property of phasor measurement unit (PMU) data, we formulate the identification problem of successive unobservable cyber attacks as a matrix decomposition problem of a low-rank matrix plus a transformed column-sparse matrix. We propose a convex-optimization-based method and provide its theoretical guarantee in the data identification. Numerical experiments on actual PMU data from the Central New York power system and synthetic data are conducted to verify the effectiveness of the proposed method.
Revealing the Hidden Relationship by Sparse Modules in Complex Networks with a Large-Scale Analysis
Jiao, Qing-Ju; Huang, Yan; Liu, Wei; Wang, Xiao-Fan; Chen, Xiao-Shuang; Shen, Hong-Bin
2013-01-01
One of the remarkable features of networks is module that can provide useful insights into not only network organizations but also functional behaviors between their components. Comprehensive efforts have been devoted to investigating cohesive modules in the past decade. However, it is still not clear whether there are important structural characteristics of the nodes that do not belong to any cohesive module. In order to answer this question, we performed a large-scale analysis on 25 complex networks with different types and scales using our recently developed BTS (bintree seeking) algorithm, which is able to detect both cohesive and sparse modules in the network. Our results reveal that the sparse modules composed by the cohesively isolated nodes widely co-exist with the cohesive modules. Detailed analysis shows that both types of modules provide better characterization for the division of a network into functional units than merely cohesive modules, because the sparse modules possibly re-organize the nodes in the so-called cohesive modules, which lack obvious modular significance, into meaningful groups. Compared with cohesive modules, the sizes of sparse ones are generally smaller. Sparse modules are also found to have preferences in social and biological networks than others. PMID:23762457
Iris recognition based on robust principal component analysis
NASA Astrophysics Data System (ADS)
Karn, Pradeep; He, Xiao Hai; Yang, Shuai; Wu, Xiao Hong
2014-11-01
Iris images acquired under different conditions often suffer from blur, occlusion due to eyelids and eyelashes, specular reflection, and other artifacts. Existing iris recognition systems do not perform well on these types of images. To overcome these problems, we propose an iris recognition method based on robust principal component analysis. The proposed method decomposes all training images into a low-rank matrix and a sparse error matrix, where the low-rank matrix is used for feature extraction. The sparsity concentration index approach is then applied to validate the recognition result. Experimental results using CASIA V4 and IIT Delhi V1iris image databases showed that the proposed method achieved competitive performances in both recognition accuracy and computational efficiency.
Sparse Covariance Matrix Estimation With Eigenvalue Constraints
LIU, Han; WANG, Lie; ZHAO, Tuo
2014-01-01
We propose a new approach for estimating high-dimensional, positive-definite covariance matrices. Our method extends the generalized thresholding operator by adding an explicit eigenvalue constraint. The estimated covariance matrix simultaneously achieves sparsity and positive definiteness. The estimator is rate optimal in the minimax sense and we develop an efficient iterative soft-thresholding and projection algorithm based on the alternating direction method of multipliers. Empirically, we conduct thorough numerical experiments on simulated datasets as well as real data examples to illustrate the usefulness of our method. Supplementary materials for the article are available online. PMID:25620866
Disconnected Diagrams in Lattice QCD
NASA Astrophysics Data System (ADS)
Gambhir, Arjun Singh
In this work, we present state-of-the-art numerical methods and their applications for computing a particular class of observables using lattice quantum chromodynamics (Lattice QCD), a discretized version of the fundamental theory of quarks and gluons. These observables require calculating so called "disconnected diagrams" and are important for understanding many aspects of hadron structure, such as the strange content of the proton. We begin by introducing the reader to the key concepts of Lattice QCD and rigorously define the meaning of disconnected diagrams through an example of the Wick contractions of the nucleon. Subsequently, the calculation of observables requiring disconnected diagrams is posed as the computationally challenging problem of finding the trace of the inverse of an incredibly large, sparse matrix. This is followed by a brief primer of numerical sparse matrix techniques that overviews broadly used methods in Lattice QCD and builds the background for the novel algorithm presented in this work. We then introduce singular value deflation as a method to improve convergence of trace estimation and analyze its effects on matrices from a variety of fields, including chemical transport modeling, magnetohydrodynamics, and QCD. Finally, we apply this method to compute observables such as the strange axial charge of the proton and strange sigma terms in light nuclei. The work in this thesis is innovative for four reasons. First, we analyze the effects of deflation with a model that makes qualitative predictions about its effectiveness, taking only the singular value spectrum as input, and compare deflated variance with different types of trace estimator noise. Second, the synergy between probing methods and deflation is investigated both experimentally and theoretically. Third, we use the synergistic combination of deflation and a graph coloring algorithm known as hierarchical probing to conduct a lattice calculation of light disconnected matrix elements of the nucleon at two different values of the lattice spacing. Finally, we employ these algorithms to do a high-precision study of strange sigma terms in light nuclei; to our knowledge this is the first calculation of its kind from Lattice QCD.
Disconnected Diagrams in Lattice QCD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gambhir, Arjun
In this work, we present state-of-the-art numerical methods and their applications for computing a particular class of observables using lattice quantum chromodynamics (Lattice QCD), a discretized version of the fundamental theory of quarks and gluons. These observables require calculating so called \\disconnected diagrams" and are important for understanding many aspects of hadron structure, such as the strange content of the proton. We begin by introducing the reader to the key concepts of Lattice QCD and rigorously define the meaning of disconnected diagrams through an example of the Wick contractions of the nucleon. Subsequently, the calculation of observables requiring disconnected diagramsmore » is posed as the computationally challenging problem of finding the trace of the inverse of an incredibly large, sparse matrix. This is followed by a brief primer of numerical sparse matrix techniques that overviews broadly used methods in Lattice QCD and builds the background for the novel algorithm presented in this work. We then introduce singular value deflation as a method to improve convergence of trace estimation and analyze its effects on matrices from a variety of fields, including chemical transport modeling, magnetohydrodynamics, and QCD. Finally, we apply this method to compute observables such as the strange axial charge of the proton and strange sigma terms in light nuclei. The work in this thesis is innovative for four reasons. First, we analyze the effects of deflation with a model that makes qualitative predictions about its effectiveness, taking only the singular value spectrum as input, and compare deflated variance with different types of trace estimator noise. Second, the synergy between probing methods and deflation is investigated both experimentally and theoretically. Third, we use the synergistic combination of deflation and a graph coloring algorithm known as hierarchical probing to conduct a lattice calculation of light disconnected matrix elements of the nucleon at two different values of the lattice spacing. Finally, we employ these algorithms to do a high-precision study of strange sigma terms in light nuclei; to our knowledge this is the first calculation of its kind from Lattice QCD.« less
Application of a sparseness constraint in multivariate curve resolution - Alternating least squares.
Hugelier, Siewert; Piqueras, Sara; Bedia, Carmen; de Juan, Anna; Ruckebusch, Cyril
2018-02-13
The use of sparseness in chemometrics is a concept that has increased in popularity. The advantage is, above all, a better interpretability of the results obtained. In this work, sparseness is implemented as a constraint in multivariate curve resolution - alternating least squares (MCR-ALS), which aims at reproducing raw (mixed) data by a bilinear model of chemically meaningful profiles. In many cases, the mixed raw data analyzed are not sparse by nature, but their decomposition profiles can be, as it is the case in some instrumental responses, such as mass spectra, or in concentration profiles linked to scattered distribution maps of powdered samples in hyperspectral images. To induce sparseness in the constrained profiles, one-dimensional and/or two-dimensional numerical arrays can be fitted using a basis of Gaussian functions with a penalty on the coefficients. In this work, a least squares regression framework with L 0 -norm penalty is applied. This L 0 -norm penalty constrains the number of non-null coefficients in the fit of the array constrained without having an a priori on the number and their positions. It has been shown that the sparseness constraint induces the suppression of values linked to uninformative channels and noise in MS spectra and improves the location of scattered compounds in distribution maps, resulting in a better interpretability of the constrained profiles. An additional benefit of the sparseness constraint is a lower ambiguity in the bilinear model, since the major presence of null coefficients in the constrained profiles also helps to limit the solutions for the profiles in the counterpart matrix of the MCR bilinear model. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de
2015-07-21
In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implementsmore » sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.« less
Pinski, Peter; Riplinger, Christoph; Valeev, Edward F; Neese, Frank
2015-07-21
In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in quantum chemistry and beyond.
JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.
Labschütz, Matthias; Bruckner, Stefan; Gröller, M Eduard; Hadwiger, Markus; Rautek, Peter
2016-01-01
Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.
Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.
Hashemi, Abolfazl; Zhu, Banghua; Vikalo, Haris
2018-03-21
Haplotype assembly is the task of reconstructing haplotypes of an individual from a mixture of sequenced chromosome fragments. Haplotype information enables studies of the effects of genetic variations on an organism's phenotype. Most of the mathematical formulations of haplotype assembly are known to be NP-hard and haplotype assembly becomes even more challenging as the sequencing technology advances and the length of the paired-end reads and inserts increases. Assembly of haplotypes polyploid organisms is considerably more difficult than in the case of diploids. Hence, scalable and accurate schemes with provable performance are desired for haplotype assembly of both diploid and polyploid organisms. We propose a framework that formulates haplotype assembly from sequencing data as a sparse tensor decomposition. We cast the problem as that of decomposing a tensor having special structural constraints and missing a large fraction of its entries into a product of two factors, U and [Formula: see text]; tensor [Formula: see text] reveals haplotype information while U is a sparse matrix encoding the origin of erroneous sequencing reads. An algorithm, AltHap, which reconstructs haplotypes of either diploid or polyploid organisms by iteratively solving this decomposition problem is proposed. The performance and convergence properties of AltHap are theoretically analyzed and, in doing so, guarantees on the achievable minimum error correction scores and correct phasing rate are established. The developed framework is applicable to diploid, biallelic and polyallelic polyploid species. The code for AltHap is freely available from https://github.com/realabolfazl/AltHap . AltHap was tested in a number of different scenarios and was shown to compare favorably to state-of-the-art methods in applications to haplotype assembly of diploids, and significantly outperforms existing techniques when applied to haplotype assembly of polyploids.
Clutter Mitigation in Echocardiography Using Sparse Signal Separation
Yavneh, Irad
2015-01-01
In ultrasound imaging, clutter artifacts degrade images and may cause inaccurate diagnosis. In this paper, we apply a method called Morphological Component Analysis (MCA) for sparse signal separation with the objective of reducing such clutter artifacts. The MCA approach assumes that the two signals in the additive mix have each a sparse representation under some dictionary of atoms (a matrix), and separation is achieved by finding these sparse representations. In our work, an adaptive approach is used for learning the dictionary from the echo data. MCA is compared to Singular Value Filtering (SVF), a Principal Component Analysis- (PCA-) based filtering technique, and to a high-pass Finite Impulse Response (FIR) filter. Each filter is applied to a simulated hypoechoic lesion sequence, as well as experimental cardiac ultrasound data. MCA is demonstrated in both cases to outperform the FIR filter and obtain results comparable to the SVF method in terms of contrast-to-noise ratio (CNR). Furthermore, MCA shows a lower impact on tissue sections while removing the clutter artifacts. In experimental heart data, MCA obtains in our experiments clutter mitigation with an average CNR improvement of 1.33 dB. PMID:26199622
Fast and Accurate Simulation Technique for Large Irregular Arrays
NASA Astrophysics Data System (ADS)
Bui-Van, Ha; Abraham, Jens; Arts, Michel; Gueuning, Quentin; Raucy, Christopher; Gonzalez-Ovejero, David; de Lera Acedo, Eloy; Craeye, Christophe
2018-04-01
A fast full-wave simulation technique is presented for the analysis of large irregular planar arrays of identical 3-D metallic antennas. The solution method relies on the Macro Basis Functions (MBF) approach and an interpolatory technique to compute the interactions between MBFs. The Harmonic-polynomial (HARP) model is established for the near-field interactions in a modified system of coordinates. For extremely large arrays made of complex antennas, two approaches assuming a limited radius of influence for mutual coupling are considered: one is based on a sparse-matrix LU decomposition and the other one on a tessellation of the array in the form of overlapping sub-arrays. The computation of all embedded element patterns is sped up with the help of the non-uniform FFT algorithm. Extensive validations are shown for arrays of log-periodic antennas envisaged for the low-frequency SKA (Square Kilometer Array) radio-telescope. The analysis of SKA stations with such a large number of elements has not been treated yet in the literature. Validations include comparison with results obtained with commercial software and with experiments. The proposed method is particularly well suited to array synthesis, in which several orders of magnitude can be saved in terms of computation time.
PLATSIM: An efficient linear simulation and analysis package for large-order flexible systems
NASA Technical Reports Server (NTRS)
Maghami, Periman; Kenny, Sean P.; Giesy, Daniel P.
1995-01-01
PLATSIM is a software package designed to provide efficient time and frequency domain analysis of large-order generic space platforms implemented with any linear time-invariant control system. Time domain analysis provides simulations of the overall spacecraft response levels due to either onboard or external disturbances. The time domain results can then be processed by the jitter analysis module to assess the spacecraft's pointing performance in a computationally efficient manner. The resulting jitter analysis algorithms have produced an increase in speed of several orders of magnitude over the brute force approach of sweeping minima and maxima. Frequency domain analysis produces frequency response functions for uncontrolled and controlled platform configurations. The latter represents an enabling technology for large-order flexible systems. PLATSIM uses a sparse matrix formulation for the spacecraft dynamics model which makes both the time and frequency domain operations quite efficient, particularly when a large number of modes are required to capture the true dynamics of the spacecraft. The package is written in MATLAB script language. A graphical user interface (GUI) is included in the PLATSIM software package. This GUI uses MATLAB's Handle graphics to provide a convenient way for setting simulation and analysis parameters.
Improved Personalized Recommendation Based on Causal Association Rule and Collaborative Filtering
ERIC Educational Resources Information Center
Lei, Wu; Qing, Fang; Zhou, Jin
2016-01-01
There are usually limited user evaluation of resources on a recommender system, which caused an extremely sparse user rating matrix, and this greatly reduce the accuracy of personalized recommendation, especially for new users or new items. This paper presents a recommendation method based on rating prediction using causal association rules.…
IMPLEMENTATION OF THE SMOKE EMISSION DATA PROCESSOR AND SMOKE TOOL INPUT DATA PROCESSOR IN MODELS-3
The U.S. Environmental Protection Agency has implemented Version 1.3 of SMOKE (Sparse Matrix Object Kernel Emission) processor for preparation of area, mobile, point, and biogenic sources emission data within Version 4.1 of the Models-3 air quality modeling framework. The SMOK...
Optimal Chebyshev polynomials on ellipses in the complex plane
NASA Technical Reports Server (NTRS)
Fischer, Bernd; Freund, Roland
1989-01-01
The design of iterative schemes for sparse matrix computations often leads to constrained polynomial approximation problems on sets in the complex plane. For the case of ellipses, we introduce a new class of complex polynomials which are in general very good approximations to the best polynomials and even optimal in most cases.
The Models-3 Community Multi-scale Air Quality (CMAQ) model, first released by the USEPA in 1999 (Byun and Ching. 1999), continues to be developed and evaluated. The principal components of the CMAQ system include a comprehensive emission processor known as the Sparse Matrix O...
Three-Dimensional Inverse Transport Solver Based on Compressive Sensing Technique
NASA Astrophysics Data System (ADS)
Cheng, Yuxiong; Wu, Hongchun; Cao, Liangzhi; Zheng, Youqi
2013-09-01
According to the direct exposure measurements from flash radiographic image, a compressive sensing-based method for three-dimensional inverse transport problem is presented. The linear absorption coefficients and interface locations of objects are reconstructed directly at the same time. It is always very expensive to obtain enough measurements. With limited measurements, compressive sensing sparse reconstruction technique orthogonal matching pursuit is applied to obtain the sparse coefficients by solving an optimization problem. A three-dimensional inverse transport solver is developed based on a compressive sensing-based technique. There are three features in this solver: (1) AutoCAD is employed as a geometry preprocessor due to its powerful capacity in graphic. (2) The forward projection matrix rather than Gauss matrix is constructed by the visualization tool generator. (3) Fourier transform and Daubechies wavelet transform are adopted to convert an underdetermined system to a well-posed system in the algorithm. Simulations are performed and numerical results in pseudo-sine absorption problem, two-cube problem and two-cylinder problem when using compressive sensing-based solver agree well with the reference value.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2008-10-16
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Cross-correlation matrix analysis of Chinese and American bank stocks in subprime crisis
NASA Astrophysics Data System (ADS)
Zhu, Shi-Zhao; Li, Xin-Li; Nie, Sen; Zhang, Wen-Qing; Yu, Gao-Feng; Han, Xiao-Pu; Wang, Bing-Hong
2015-05-01
In order to study the universality of the interactions among different markets, we analyze the cross-correlation matrix of the price of the Chinese and American bank stocks. We then find that the stock prices of the emerging market are more correlated than that of the developed market. Considering that the values of the components for the eigenvector may be positive or negative, we analyze the differences between two markets in combination with the endogenous and exogenous events which influence the financial markets. We find that the sparse pattern of components of eigenvectors out of the threshold value has no change in American bank stocks before and after the subprime crisis. However, it changes from sparse to dense for Chinese bank stocks. By using the threshold value to exclude the external factors, we simulate the interactions in financial markets. Project supported by the National Natural Science Foundation of China (Grant Nos. 11275186, 91024026, and FOM2014OF001) and the University of Shanghai for Science and Technology (USST) of Humanities and Social Sciences, China (Grant Nos. USST13XSZ05 and 11YJA790231).
Deterministic matrices matching the compressed sensing phase transitions of Gaussian random matrices
Monajemi, Hatef; Jafarpour, Sina; Gavish, Matan; Donoho, David L.; Ambikasaran, Sivaram; Bacallado, Sergio; Bharadia, Dinesh; Chen, Yuxin; Choi, Young; Chowdhury, Mainak; Chowdhury, Soham; Damle, Anil; Fithian, Will; Goetz, Georges; Grosenick, Logan; Gross, Sam; Hills, Gage; Hornstein, Michael; Lakkam, Milinda; Lee, Jason; Li, Jian; Liu, Linxi; Sing-Long, Carlos; Marx, Mike; Mittal, Akshay; Monajemi, Hatef; No, Albert; Omrani, Reza; Pekelis, Leonid; Qin, Junjie; Raines, Kevin; Ryu, Ernest; Saxe, Andrew; Shi, Dai; Siilats, Keith; Strauss, David; Tang, Gary; Wang, Chaojun; Zhou, Zoey; Zhu, Zhen
2013-01-01
In compressed sensing, one takes samples of an N-dimensional vector using an matrix A, obtaining undersampled measurements . For random matrices with independent standard Gaussian entries, it is known that, when is k-sparse, there is a precisely determined phase transition: for a certain region in the (,)-phase diagram, convex optimization typically finds the sparsest solution, whereas outside that region, it typically fails. It has been shown empirically that the same property—with the same phase transition location—holds for a wide range of non-Gaussian random matrix ensembles. We report extensive experiments showing that the Gaussian phase transition also describes numerous deterministic matrices, including Spikes and Sines, Spikes and Noiselets, Paley Frames, Delsarte-Goethals Frames, Chirp Sensing Matrices, and Grassmannian Frames. Namely, for each of these deterministic matrices in turn, for a typical k-sparse object, we observe that convex optimization is successful over a region of the phase diagram that coincides with the region known for Gaussian random matrices. Our experiments considered coefficients constrained to for four different sets , and the results establish our finding for each of the four associated phase transitions. PMID:23277588
Meng, Yuguang; Lei, Hao
2010-06-01
An efficient iterative gridding reconstruction method with correction of off-resonance artifacts was developed, which is especially tailored for multiple-shot non-Cartesian imaging. The novelty of the method lies in that the transformation matrix for gridding (T) was constructed as the convolution of two sparse matrices, among which the former is determined by the sampling interval and the spatial distribution of the off-resonance frequencies and the latter by the sampling trajectory and the target grid in the Cartesian space. The resulting T matrix is also sparse and can be solved efficiently with the iterative conjugate gradient algorithm. It was shown that, with the proposed method, the reconstruction speed in multiple-shot non-Cartesian imaging can be improved significantly while retaining high reconstruction fidelity. More important, the method proposed allows tradeoff between the accuracy and the computation time of reconstruction, making customization of the use of such a method in different applications possible. The performance of the proposed method was demonstrated by numerical simulation and multiple-shot spiral imaging on rat brain at 4.7 T. (c) 2010 Wiley-Liss, Inc.
Sequential time interleaved random equivalent sampling for repetitive signal.
Zhao, Yijiu; Liu, Jingjing
2016-12-01
Compressed sensing (CS) based sampling techniques exhibit many advantages over other existing approaches for sparse signal spectrum sensing; they are also incorporated into non-uniform sampling signal reconstruction to improve the efficiency, such as random equivalent sampling (RES). However, in CS based RES, only one sample of each acquisition is considered in the signal reconstruction stage, and it will result in more acquisition runs and longer sampling time. In this paper, a sampling sequence is taken in each RES acquisition run, and the corresponding block measurement matrix is constructed using a Whittaker-Shannon interpolation formula. All the block matrices are combined into an equivalent measurement matrix with respect to all sampling sequences. We implemented the proposed approach with a multi-cores analog-to-digital converter (ADC), whose ADC cores are time interleaved. A prototype realization of this proposed CS based sequential random equivalent sampling method has been developed. It is able to capture an analog waveform at an equivalent sampling rate of 40 GHz while sampled at 1 GHz physically. Experiments indicate that, for a sparse signal, the proposed CS based sequential random equivalent sampling exhibits high efficiency.
Petrov, Andrii Y; Herbst, Michael; Andrew Stenger, V
2017-08-15
Rapid whole-brain dynamic Magnetic Resonance Imaging (MRI) is of particular interest in Blood Oxygen Level Dependent (BOLD) functional MRI (fMRI). Faster acquisitions with higher temporal sampling of the BOLD time-course provide several advantages including increased sensitivity in detecting functional activation, the possibility of filtering out physiological noise for improving temporal SNR, and freezing out head motion. Generally, faster acquisitions require undersampling of the data which results in aliasing artifacts in the object domain. A recently developed low-rank (L) plus sparse (S) matrix decomposition model (L+S) is one of the methods that has been introduced to reconstruct images from undersampled dynamic MRI data. The L+S approach assumes that the dynamic MRI data, represented as a space-time matrix M, is a linear superposition of L and S components, where L represents highly spatially and temporally correlated elements, such as the image background, while S captures dynamic information that is sparse in an appropriate transform domain. This suggests that L+S might be suited for undersampled task or slow event-related fMRI acquisitions because the periodic nature of the BOLD signal is sparse in the temporal Fourier transform domain and slowly varying low-rank brain background signals, such as physiological noise and drift, will be predominantly low-rank. In this work, as a proof of concept, we exploit the L+S method for accelerating block-design fMRI using a 3D stack of spirals (SoS) acquisition where undersampling is performed in the k z -t domain. We examined the feasibility of the L+S method to accurately separate temporally correlated brain background information in the L component while capturing periodic BOLD signals in the S component. We present results acquired in control human volunteers at 3T for both retrospective and prospectively acquired fMRI data for a visual activation block-design task. We show that a SoS fMRI acquisition with an acceleration of four and L+S reconstruction can achieve a brain coverage of 40 slices at 2mm isotropic resolution and 64 x 64 matrix size every 500ms. Copyright © 2017 Elsevier Inc. All rights reserved.
An efficient optical architecture for sparsely connected neural networks
NASA Technical Reports Server (NTRS)
Hine, Butler P., III; Downie, John D.; Reid, Max B.
1990-01-01
An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
Face recognition based on two-dimensional discriminant sparse preserving projection
NASA Astrophysics Data System (ADS)
Zhang, Dawei; Zhu, Shanan
2018-04-01
In this paper, a supervised dimensionality reduction algorithm named two-dimensional discriminant sparse preserving projection (2DDSPP) is proposed for face recognition. In order to accurately model manifold structure of data, 2DDSPP constructs within-class affinity graph and between-class affinity graph by the constrained least squares (LS) and l1 norm minimization problem, respectively. Based on directly operating on image matrix, 2DDSPP integrates graph embedding (GE) with Fisher criterion. The obtained projection subspace preserves within-class neighborhood geometry structure of samples, while keeping away samples from different classes. The experimental results on the PIE and AR face databases show that 2DDSPP can achieve better recognition performance.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics
NASA Astrophysics Data System (ADS)
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-09-01
Multiconjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wave-front control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10-2 Hz, i.e., 4-5 orders of magnitude lower than the typical 103 Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Wang, Guoli; Ebrahimi, Nader
2014-01-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader
2015-04-01
Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem*
Katsevich, E.; Katsevich, A.; Singer, A.
2015-01-01
In cryo-electron microscopy (cryo-EM), a microscope generates a top view of a sample of randomly oriented copies of a molecule. The problem of single particle reconstruction (SPR) from cryo-EM is to use the resulting set of noisy two-dimensional projection images taken at unknown directions to reconstruct the three-dimensional (3D) structure of the molecule. In some situations, the molecule under examination exhibits structural variability, which poses a fundamental challenge in SPR. The heterogeneity problem is the task of mapping the space of conformational states of a molecule. It has been previously suggested that the leading eigenvectors of the covariance matrix of the 3D molecules can be used to solve the heterogeneity problem. Estimating the covariance matrix is challenging, since only projections of the molecules are observed, but not the molecules themselves. In this paper, we formulate a general problem of covariance estimation from noisy projections of samples. This problem has intimate connections with matrix completion problems and high-dimensional principal component analysis. We propose an estimator and prove its consistency. When there are finitely many heterogeneity classes, the spectrum of the estimated covariance matrix reveals the number of classes. The estimator can be found as the solution to a certain linear system. In the cryo-EM case, the linear operator to be inverted, which we term the projection covariance transform, is an important object in covariance estimation for tomographic problems involving structural variation. Inverting it involves applying a filter akin to the ramp filter in tomography. We design a basis in which this linear operator is sparse and thus can be tractably inverted despite its large size. We demonstrate via numerical experiments on synthetic datasets the robustness of our algorithm to high levels of noise. PMID:25699132
Discriminant WSRC for Large-Scale Plant Species Recognition.
Zhang, Shanwen; Zhang, Chuanlei; Zhu, Yihai; You, Zhuhong
2017-01-01
In sparse representation based classification (SRC) and weighted SRC (WSRC), it is time-consuming to solve the global sparse representation problem. A discriminant WSRC (DWSRC) is proposed for large-scale plant species recognition, including two stages. Firstly, several subdictionaries are constructed by dividing the dataset into several similar classes, and a subdictionary is chosen by the maximum similarity between the test sample and the typical sample of each similar class. Secondly, the weighted sparse representation of the test image is calculated with respect to the chosen subdictionary, and then the leaf category is assigned through the minimum reconstruction error. Different from the traditional SRC and its improved approaches, we sparsely represent the test sample on a subdictionary whose base elements are the training samples of the selected similar class, instead of using the generic overcomplete dictionary on the entire training samples. Thus, the complexity to solving the sparse representation problem is reduced. Moreover, DWSRC is adapted to newly added leaf species without rebuilding the dictionary. Experimental results on the ICL plant leaf database show that the method has low computational complexity and high recognition rate and can be clearly interpreted.
SAMBA: Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos
NASA Astrophysics Data System (ADS)
Ahlfeld, R.; Belkouchi, B.; Montomoli, F.
2016-09-01
A new arbitrary Polynomial Chaos (aPC) method is presented for moderately high-dimensional problems characterised by limited input data availability. The proposed methodology improves the algorithm of aPC and extends the method, that was previously only introduced as tensor product expansion, to moderately high-dimensional stochastic problems. The fundamental idea of aPC is to use the statistical moments of the input random variables to develop the polynomial chaos expansion. This approach provides the possibility to propagate continuous or discrete probability density functions and also histograms (data sets) as long as their moments exist, are finite and the determinant of the moment matrix is strictly positive. For cases with limited data availability, this approach avoids bias and fitting errors caused by wrong assumptions. In this work, an alternative way to calculate the aPC is suggested, which provides the optimal polynomials, Gaussian quadrature collocation points and weights from the moments using only a handful of matrix operations on the Hankel matrix of moments. It can therefore be implemented without requiring prior knowledge about statistical data analysis or a detailed understanding of the mathematics of polynomial chaos expansions. The extension to more input variables suggested in this work, is an anisotropic and adaptive version of Smolyak's algorithm that is solely based on the moments of the input probability distributions. It is referred to as SAMBA (PC), which is short for Sparse Approximation of Moment-Based Arbitrary Polynomial Chaos. It is illustrated that for moderately high-dimensional problems (up to 20 different input variables or histograms) SAMBA can significantly simplify the calculation of sparse Gaussian quadrature rules. SAMBA's efficiency for multivariate functions with regard to data availability is further demonstrated by analysing higher order convergence and accuracy for a set of nonlinear test functions with 2, 5 and 10 different input distributions or histograms.
Locating multiple diffusion sources in time varying networks from sparse observations.
Hu, Zhao-Long; Shen, Zhesi; Cao, Shinan; Podobnik, Boris; Yang, Huijie; Wang, Wen-Xu; Lai, Ying-Cheng
2018-02-08
Data based source localization in complex networks has a broad range of applications. Despite recent progress, locating multiple diffusion sources in time varying networks remains to be an outstanding problem. Bridging structural observability and sparse signal reconstruction theories, we develop a general framework to locate diffusion sources in time varying networks based solely on sparse data from a small set of messenger nodes. A general finding is that large degree nodes produce more valuable information than small degree nodes, a result that contrasts that for static networks. Choosing large degree nodes as the messengers, we find that sparse observations from a few such nodes are often sufficient for any number of diffusion sources to be located for a variety of model and empirical networks. Counterintuitively, sources in more rapidly varying networks can be identified more readily with fewer required messenger nodes.
Mathematical foundations of hybrid data assimilation from a synchronization perspective
NASA Astrophysics Data System (ADS)
Penny, Stephen G.
2017-12-01
The state-of-the-art data assimilation methods used today in operational weather prediction centers around the world can be classified as generalized one-way coupled impulsive synchronization. This classification permits the investigation of hybrid data assimilation methods, which combine dynamic error estimates of the system state with long time-averaged (climatological) error estimates, from a synchronization perspective. Illustrative results show how dynamically informed formulations of the coupling matrix (via an Ensemble Kalman Filter, EnKF) can lead to synchronization when observing networks are sparse and how hybrid methods can lead to synchronization when those dynamic formulations are inadequate (due to small ensemble sizes). A large-scale application with a global ocean general circulation model is also presented. Results indicate that the hybrid methods also have useful applications in generalized synchronization, in particular, for correcting systematic model errors.
New numerical method for radiation heat transfer in nonhomogeneous participating media
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howell, J.R.; Tan, Zhiqiang
A new numerical method, which solves the exact integral equations of distance-angular integration form for radiation transfer, is introduced in this paper. By constructing and prestoring the numerical integral formulas for the distance integral for appropriate kernel functions, this method eliminates the time consuming evaluations of the kernels of the space integrals in the formal computations. In addition, when the number of elements in the system is large, the resulting coefficient matrix is quite sparse. Thus, either considerable time or much storage can be saved. A weakness of the method is discussed, and some remedies are suggested. As illustrations, somemore » one-dimensional and two-dimensional problems in both homogeneous and inhomogeneous emitting, absorbing, and linear anisotropic scattering media are studied. Some results are compared with available data. 13 refs.« less
Preferential attachment in multiple trade networks
NASA Astrophysics Data System (ADS)
Foschi, Rachele; Riccaboni, Massimo; Schiavo, Stefano
2014-08-01
In this paper we develop a model for the evolution of multiple networks which is able to replicate the concentrated and sparse nature of world trade data. Our model is an extension of the preferential attachment growth model to the case of multiple networks. Countries trade a variety of goods of different complexity. Every country progressively evolves from trading less sophisticated to high-tech goods. The probabilities of capturing more trade opportunities at a given level of complexity and of starting to trade more complex goods are both proportional to the number of existing trade links. We provide a set of theoretical predictions and simulative results. A calibration exercise shows that our model replicates the same concentration level of world trade as well as the sparsity pattern of the trade matrix. We also discuss a set of numerical solutions to deal with large multiple networks.
Mathematical foundations of hybrid data assimilation from a synchronization perspective.
Penny, Stephen G
2017-12-01
The state-of-the-art data assimilation methods used today in operational weather prediction centers around the world can be classified as generalized one-way coupled impulsive synchronization. This classification permits the investigation of hybrid data assimilation methods, which combine dynamic error estimates of the system state with long time-averaged (climatological) error estimates, from a synchronization perspective. Illustrative results show how dynamically informed formulations of the coupling matrix (via an Ensemble Kalman Filter, EnKF) can lead to synchronization when observing networks are sparse and how hybrid methods can lead to synchronization when those dynamic formulations are inadequate (due to small ensemble sizes). A large-scale application with a global ocean general circulation model is also presented. Results indicate that the hybrid methods also have useful applications in generalized synchronization, in particular, for correcting systematic model errors.
NASA Astrophysics Data System (ADS)
Prodhan, Suryoday; Ramasesha, S.
2018-05-01
The symmetry adapted density matrix renormalization group (SDMRG) technique has been an efficient method for studying low-lying eigenstates in one- and quasi-one-dimensional electronic systems. However, the SDMRG method had bottlenecks involving the construction of linearly independent symmetry adapted basis states as the symmetry matrices in the DMRG basis were not sparse. We have developed a modified algorithm to overcome this bottleneck. The new method incorporates end-to-end interchange symmetry (C2) , electron-hole symmetry (J ) , and parity or spin-flip symmetry (P ) in these calculations. The one-to-one correspondence between direct-product basis states in the DMRG Hilbert space for these symmetry operations renders the symmetry matrices in the new basis with maximum sparseness, just one nonzero matrix element per row. Using methods similar to those employed in the exact diagonalization technique for Pariser-Parr-Pople (PPP) models, developed in the 1980s, it is possible to construct orthogonal SDMRG basis states while bypassing the slow step of the Gram-Schmidt orthonormalization procedure. The method together with the PPP model which incorporates long-range electronic correlations is employed to study the correlated excited-state spectra of 1,12-benzoperylene and a narrow mixed graphene nanoribbon with a chrysene molecule as the building unit, comprising both zigzag and cove-edge structures.
Semi-automatic sparse preconditioners for high-order finite element methods on non-uniform meshes
NASA Astrophysics Data System (ADS)
Austin, Travis M.; Brezina, Marian; Jamroz, Ben; Jhurani, Chetan; Manteuffel, Thomas A.; Ruge, John
2012-05-01
High-order finite elements often have a higher accuracy per degree of freedom than the classical low-order finite elements. However, in the context of implicit time-stepping methods, high-order finite elements present challenges to the construction of efficient simulations due to the high cost of inverting the denser finite element matrix. There are many cases where simulations are limited by the memory required to store the matrix and/or the algorithmic components of the linear solver. We are particularly interested in preconditioned Krylov methods for linear systems generated by discretization of elliptic partial differential equations with high-order finite elements. Using a preconditioner like Algebraic Multigrid can be costly in terms of memory due to the need to store matrix information at the various levels. We present a novel method for defining a preconditioner for systems generated by high-order finite elements that is based on a much sparser system than the original high-order finite element system. We investigate the performance for non-uniform meshes on a cube and a cubed sphere mesh, showing that the sparser preconditioner is more efficient and uses significantly less memory. Finally, we explore new methods to construct the sparse preconditioner and examine their effectiveness for non-uniform meshes. We compare results to a direct use of Algebraic Multigrid as a preconditioner and to a two-level additive Schwarz method.
Data traffic reduction schemes for sparse Cholesky factorizations
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1988-01-01
Load distribution schemes are presented which minimize the total data traffic in the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems with local and shared memory. The total data traffic in factoring an n x n sparse, symmetric, positive definite matrix representing an n-vertex regular 2-D grid graph using n (sup alpha), alpha is equal to or less than 1, processors are shown to be O(n(sup 1 + alpha/2)). It is O(n(sup 3/2)), when n (sup alpha), alpha is equal to or greater than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal. The schemes allow efficient use of up to O(n) processors before the total data traffic reaches the maximum value of O(n(sup 3/2)). The partitioning employed within the scheme, allows a better utilization of the data accessed from shared memory than those of previously published methods.
Inference for High-dimensional Differential Correlation Matrices.
Cai, T Tony; Zhang, Anru
2016-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jakeman, John D.; Narayan, Akil; Zhou, Tao
We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditionedmore » $$\\ell^1$$-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. In conclusion, numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.« less
Jakeman, John D.; Narayan, Akil; Zhou, Tao
2017-06-22
We propose an algorithm for recovering sparse orthogonal polynomial expansions via collocation. A standard sampling approach for recovering sparse polynomials uses Monte Carlo sampling, from the density of orthogonality, which results in poor function recovery when the polynomial degree is high. Our proposed approach aims to mitigate this limitation by sampling with respect to the weighted equilibrium measure of the parametric domain and subsequently solves a preconditionedmore » $$\\ell^1$$-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. Our algorithm can be applied to a wide class of orthogonal polynomial families on bounded and unbounded domains, including all classical families. We present theoretical analysis to motivate the algorithm and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. In conclusion, numerical examples are also provided to demonstrate that our proposed algorithm leads to comparable or improved accuracy even when compared with Legendre- and Hermite-specific algorithms.« less
Deflation as a method of variance reduction for estimating the trace of a matrix inverse
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gambhir, Arjun Singh; Stathopoulos, Andreas; Orginos, Kostas
Many fields require computing the trace of the inverse of a large, sparse matrix. The typical method used for such computations is the Hutchinson method which is a Monte Carlo (MC) averaging over matrix quadratures. To improve its convergence, several variance reductions techniques have been proposed. In this paper, we study the effects of deflating the near null singular value space. We make two main contributions. First, we analyze the variance of the Hutchinson method as a function of the deflated singular values and vectors. Although this provides good intuition in general, by assuming additionally that the singular vectors aremore » random unitary matrices, we arrive at concise formulas for the deflated variance that include only the variance and mean of the singular values. We make the remarkable observation that deflation may increase variance for Hermitian matrices but not for non-Hermitian ones. This is a rare, if not unique, property where non-Hermitian matrices outperform Hermitian ones. The theory can be used as a model for predicting the benefits of deflation. Second, we use deflation in the context of a large scale application of "disconnected diagrams" in Lattice QCD. On lattices, Hierarchical Probing (HP) has previously provided an order of magnitude of variance reduction over MC by removing "error" from neighboring nodes of increasing distance in the lattice. Although deflation used directly on MC yields a limited improvement of 30% in our problem, when combined with HP they reduce variance by a factor of over 150 compared to MC. For this, we pre-computated 1000 smallest singular values of an ill-conditioned matrix of size 25 million. Furthermore, using PRIMME and a domain-specific Algebraic Multigrid preconditioner, we perform one of the largest eigenvalue computations in Lattice QCD at a fraction of the cost of our trace computation.« less
Deflation as a method of variance reduction for estimating the trace of a matrix inverse
Gambhir, Arjun Singh; Stathopoulos, Andreas; Orginos, Kostas
2017-04-06
Many fields require computing the trace of the inverse of a large, sparse matrix. The typical method used for such computations is the Hutchinson method which is a Monte Carlo (MC) averaging over matrix quadratures. To improve its convergence, several variance reductions techniques have been proposed. In this paper, we study the effects of deflating the near null singular value space. We make two main contributions. First, we analyze the variance of the Hutchinson method as a function of the deflated singular values and vectors. Although this provides good intuition in general, by assuming additionally that the singular vectors aremore » random unitary matrices, we arrive at concise formulas for the deflated variance that include only the variance and mean of the singular values. We make the remarkable observation that deflation may increase variance for Hermitian matrices but not for non-Hermitian ones. This is a rare, if not unique, property where non-Hermitian matrices outperform Hermitian ones. The theory can be used as a model for predicting the benefits of deflation. Second, we use deflation in the context of a large scale application of "disconnected diagrams" in Lattice QCD. On lattices, Hierarchical Probing (HP) has previously provided an order of magnitude of variance reduction over MC by removing "error" from neighboring nodes of increasing distance in the lattice. Although deflation used directly on MC yields a limited improvement of 30% in our problem, when combined with HP they reduce variance by a factor of over 150 compared to MC. For this, we pre-computated 1000 smallest singular values of an ill-conditioned matrix of size 25 million. Furthermore, using PRIMME and a domain-specific Algebraic Multigrid preconditioner, we perform one of the largest eigenvalue computations in Lattice QCD at a fraction of the cost of our trace computation.« less
Fast sparsely synchronized brain rhythms in a scale-free neural network.
Kim, Sang-Yoon; Lim, Woochang
2015-08-01
We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D. For small D, full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp>〈fi〉 (〈fi〉: ensemble-averaged MFR) appears due to intermittent discharge of individual neurons; in particular, the case of fp>4〈fi〉 is referred to as sparse synchronization. For the case of partial and sparse synchronization, MFRs of individual neurons vary depending on their degrees. As D passes a critical value D* (which is determined by employing an order parameter), a transition to unsynchronization occurs due to the destructive role of noise to spoil the pacing between sparse spikes. For D
Preconditioned conjugate residual methods for the solution of spectral equations
NASA Technical Reports Server (NTRS)
Wong, Y. S.; Zang, T. A.; Hussaini, M. Y.
1986-01-01
Conjugate residual methods for the solution of spectral equations are described. An inexact finite-difference operator is introduced as a preconditioner in the iterative procedures. Application of these techniques is limited to problems for which the symmetric part of the coefficient matrix is positive definite. Although the spectral equation is a very ill-conditioned and full matrix problem, the computational effort of the present iterative methods for solving such a system is comparable to that for the sparse matrix equations obtained from the application of either finite-difference or finite-element methods to the same problems. Numerical experiments are shown for a self-adjoint elliptic partial differential equation with Dirichlet boundary conditions, and comparison with other solution procedures for spectral equations is presented.
NASA Astrophysics Data System (ADS)
Li, Zhengguang; Lai, Siu-Kai; Wu, Baisheng
2018-07-01
Determining eigenvector derivatives is a challenging task due to the singularity of the coefficient matrices of the governing equations, especially for those structural dynamic systems with repeated eigenvalues. An effective strategy is proposed to construct a non-singular coefficient matrix, which can be directly used to obtain the eigenvector derivatives with distinct and repeated eigenvalues. This approach also has an advantage that only requires eigenvalues and eigenvectors of interest, without solving the particular solutions of eigenvector derivatives. The Symmetric Quasi-Minimal Residual (SQMR) method is then adopted to solve the governing equations, only the existing factored (shifted) stiffness matrix from an iterative eigensolution such as the subspace iteration method or the Lanczos algorithm is utilized. The present method can deal with both cases of simple and repeated eigenvalues in a unified manner. Three numerical examples are given to illustrate the accuracy and validity of the proposed algorithm. Highly accurate approximations to the eigenvector derivatives are obtained within a few iteration steps, making a significant reduction of the computational effort. This method can be incorporated into a coupled eigensolver/derivative software module. In particular, it is applicable for finite element models with large sparse matrices.
Margin based ontology sparse vector learning algorithm and applied in biology science.
Gao, Wei; Qudair Baig, Abdul; Ali, Haidar; Sajjad, Wasim; Reza Farahani, Mohammad
2017-01-01
In biology field, the ontology application relates to a large amount of genetic information and chemical information of molecular structure, which makes knowledge of ontology concepts convey much information. Therefore, in mathematical notation, the dimension of vector which corresponds to the ontology concept is often very large, and thus improves the higher requirements of ontology algorithm. Under this background, we consider the designing of ontology sparse vector algorithm and application in biology. In this paper, using knowledge of marginal likelihood and marginal distribution, the optimized strategy of marginal based ontology sparse vector learning algorithm is presented. Finally, the new algorithm is applied to gene ontology and plant ontology to verify its efficiency.
Background recovery via motion-based robust principal component analysis with matrix factorization
NASA Astrophysics Data System (ADS)
Pan, Peng; Wang, Yongli; Zhou, Mingyuan; Sun, Zhipeng; He, Guoping
2018-03-01
Background recovery is a key technique in video analysis, but it still suffers from many challenges, such as camouflage, lighting changes, and diverse types of image noise. Robust principal component analysis (RPCA), which aims to recover a low-rank matrix and a sparse matrix, is a general framework for background recovery. The nuclear norm is widely used as a convex surrogate for the rank function in RPCA, which requires computing the singular value decomposition (SVD), a task that is increasingly costly as matrix sizes and ranks increase. However, matrix factorization greatly reduces the dimension of the matrix for which the SVD must be computed. Motion information has been shown to improve low-rank matrix recovery in RPCA, but this method still finds it difficult to handle original video data sets because of its batch-mode formulation and implementation. Hence, in this paper, we propose a motion-assisted RPCA model with matrix factorization (FM-RPCA) for background recovery. Moreover, an efficient linear alternating direction method of multipliers with a matrix factorization (FL-ADM) algorithm is designed for solving the proposed FM-RPCA model. Experimental results illustrate that the method provides stable results and is more efficient than the current state-of-the-art algorithms.
Reliability of the Colorado Family Support Assessment: A Self-Sufficiency Matrix for Families
ERIC Educational Resources Information Center
Richmond, Melissa K.; Pampel, Fred C.; Zarcula, Flavia; Howey, Virginia; McChesney, Brenda
2017-01-01
Purpose: Family support programs commonly use self-sufficiency matrices (SSMs) to measure family outcomes, however, validation research on SSMs is sparse. This study examined the reliability of the Colorado Family Support Assessment 2.0 (CFSA 2.0) to measure family self-reliance across 14 domains (e.g., employment). Methods: Ten written case…
A manual for PARTI runtime primitives
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel
1990-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
NASA Technical Reports Server (NTRS)
Kincaid, D. R.; Young, D. M.
1984-01-01
Adapting and designing mathematical software to achieve optimum performance on the CYBER 205 is discussed. Comments and observations are made in light of recent work done on modifying the ITPACK software package and on writing new software for vector supercomputers. The goal was to develop very efficient vector algorithms and software for solving large sparse linear systems using iterative methods.
A radial basis function Galerkin method for inhomogeneous nonlocal diffusion
Lehoucq, Richard B.; Rowe, Stephen T.
2016-02-01
We introduce a discretization for a nonlocal diffusion problem using a localized basis of radial basis functions. The stiffness matrix entries are assembled by a special quadrature routine unique to the localized basis. Combining the quadrature method with the localized basis produces a well-conditioned, sparse, symmetric positive definite stiffness matrix. We demonstrate that both the continuum and discrete problems are well-posed and present numerical results for the convergence behavior of the radial basis function method. As a result, we explore approximating the solution to anisotropic differential equations by solving anisotropic nonlocal integral equations using the radial basis function method.
Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data.
Serra, Angela; Coretto, Pietro; Fratello, Michele; Tagliaferri, Roberto; Stegle, Oliver
2018-02-15
Microarray technology can be used to study the expression of thousands of genes across a number of different experimental conditions, usually hundreds. The underlying principle is that genes sharing similar expression patterns, across different samples, can be part of the same co-expression system, or they may share the same biological functions. Groups of genes are usually identified based on cluster analysis. Clustering methods rely on the similarity matrix between genes. A common choice to measure similarity is to compute the sample correlation matrix. Dimensionality reduction is another popular data analysis task which is also based on covariance/correlation matrix estimates. Unfortunately, covariance/correlation matrix estimation suffers from the intrinsic noise present in high-dimensional data. Sources of noise are: sampling variations, presents of outlying sample units, and the fact that in most cases the number of units is much larger than the number of genes. In this paper, we propose a robust correlation matrix estimator that is regularized based on adaptive thresholding. The resulting method jointly tames the effects of the high-dimensionality, and data contamination. Computations are easy to implement and do not require hand tunings. Both simulated and real data are analyzed. A Monte Carlo experiment shows that the proposed method is capable of remarkable performances. Our correlation metric is more robust to outliers compared with the existing alternatives in two gene expression datasets. It is also shown how the regularization allows to automatically detect and filter spurious correlations. The same regularization is also extended to other less robust correlation measures. Finally, we apply the ARACNE algorithm on the SyNTreN gene expression data. Sensitivity and specificity of the reconstructed network is compared with the gold standard. We show that ARACNE performs better when it takes the proposed correlation matrix estimator as input. The R software is available at https://github.com/angy89/RobustSparseCorrelation. aserra@unisa.it or robtag@unisa.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Sparse, decorrelated odor coding in the mushroom body enhances learned odor discrimination.
Lin, Andrew C; Bygrave, Alexei M; de Calignon, Alix; Lee, Tzumin; Miesenböck, Gero
2014-04-01
Sparse coding may be a general strategy of neural systems for augmenting memory capacity. In Drosophila melanogaster, sparse odor coding by the Kenyon cells of the mushroom body is thought to generate a large number of precisely addressable locations for the storage of odor-specific memories. However, it remains untested how sparse coding relates to behavioral performance. Here we demonstrate that sparseness is controlled by a negative feedback circuit between Kenyon cells and the GABAergic anterior paired lateral (APL) neuron. Systematic activation and blockade of each leg of this feedback circuit showed that Kenyon cells activated APL and APL inhibited Kenyon cells. Disrupting the Kenyon cell-APL feedback loop decreased the sparseness of Kenyon cell odor responses, increased inter-odor correlations and prevented flies from learning to discriminate similar, but not dissimilar, odors. These results suggest that feedback inhibition suppresses Kenyon cell activity to maintain sparse, decorrelated odor coding and thus the odor specificity of memories.
Three-dimensional unstructured grid Euler computations using a fully-implicit, upwind method
NASA Technical Reports Server (NTRS)
Whitaker, David L.
1993-01-01
A method has been developed to solve the Euler equations on a three-dimensional unstructured grid composed of tetrahedra. The method uses an upwind flow solver with a linearized, backward-Euler time integration scheme. Each time step results in a sparse linear system of equations which is solved by an iterative, sparse matrix solver. Local-time stepping, switched evolution relaxation (SER), preconditioning and reuse of the Jacobian are employed to accelerate the convergence rate. Implicit boundary conditions were found to be extremely important for fast convergence. Numerical experiments have shown that convergence rates comparable to that of a multigrid, central-difference scheme are achievable on the same mesh. Results are presented for several grids about an ONERA M6 wing.
2015-09-30
1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Large Scale Density Estimation of Blue and Fin Whales ...Utilizing Sparse Array Data to Develop and Implement a New Method for Estimating Blue and Fin Whale Density Len Thomas & Danielle Harris Centre...to develop and implement a new method for estimating blue and fin whale density that is effective over large spatial scales and is designed to cope
Magnus integrators on multicore CPUs and GPUs
NASA Astrophysics Data System (ADS)
Auer, N.; Einkemmer, L.; Kandolf, P.; Ostermann, A.
2018-07-01
In the present paper we consider numerical methods to solve the discrete Schrödinger equation with a time dependent Hamiltonian (motivated by problems encountered in the study of spin systems). We will consider both short-range interactions, which lead to evolution equations involving sparse matrices, and long-range interactions, which lead to dense matrices. Both of these settings show very different computational characteristics. We use Magnus integrators for time integration and employ a framework based on Leja interpolation to compute the resulting action of the matrix exponential. We consider both traditional Magnus integrators (which are extensively used for these types of problems in the literature) as well as the recently developed commutator-free Magnus integrators and implement them on modern CPU and GPU (graphics processing unit) based systems. We find that GPUs can yield a significant speed-up (up to a factor of 10 in the dense case) for these types of problems. In the sparse case GPUs are only advantageous for large problem sizes and the achieved speed-ups are more modest. In most cases the commutator-free variant is superior but especially on the GPU this advantage is rather small. In fact, none of the advantage of commutator-free methods on GPUs (and on multi-core CPUs) is due to the elimination of commutators. This has important consequences for the design of more efficient numerical methods.
Deformable segmentation via sparse representation and dictionary learning.
Zhang, Shaoting; Zhan, Yiqiang; Metaxas, Dimitris N
2012-10-01
"Shape" and "appearance", the two pillars of a deformable model, complement each other in object segmentation. In many medical imaging applications, while the low-level appearance information is weak or mis-leading, shape priors play a more important role to guide a correct segmentation, thanks to the strong shape characteristics of biological structures. Recently a novel shape prior modeling method has been proposed based on sparse learning theory. Instead of learning a generative shape model, shape priors are incorporated on-the-fly through the sparse shape composition (SSC). SSC is robust to non-Gaussian errors and still preserves individual shape characteristics even when such characteristics is not statistically significant. Although it seems straightforward to incorporate SSC into a deformable segmentation framework as shape priors, the large-scale sparse optimization of SSC has low runtime efficiency, which cannot satisfy clinical requirements. In this paper, we design two strategies to decrease the computational complexity of SSC, making a robust, accurate and efficient deformable segmentation system. (1) When the shape repository contains a large number of instances, which is often the case in 2D problems, K-SVD is used to learn a more compact but still informative shape dictionary. (2) If the derived shape instance has a large number of vertices, which often appears in 3D problems, an affinity propagation method is used to partition the surface into small sub-regions, on which the sparse shape composition is performed locally. Both strategies dramatically decrease the scale of the sparse optimization problem and hence speed up the algorithm. Our method is applied on a diverse set of biomedical image analysis problems. Compared to the original SSC, these two newly-proposed modules not only significant reduce the computational complexity, but also improve the overall accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.
Medical image classification based on multi-scale non-negative sparse coding.
Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar
2017-11-01
With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.
1988-11-01
Many physical problems require the solution of coupled partial differential equations on three-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES3 allows each spatial operator to have 7, 15, 19, or 27 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect induces which is vectorizable on some of the newer scientific computers.
NASA Astrophysics Data System (ADS)
Anderson, D. V.; Koniges, A. E.; Shumaker, D. E.
1988-11-01
Many physical problems require the solution of coupled partial differential equations on two-dimensional domains. When the time scales of interest dictate an implicit discretization of the equations a rather complicated global matrix system needs solution. The exact form of the matrix depends on the choice of spatial grids and on the finite element or finite difference approximations employed. CPDES2 allows each spatial operator to have 5 or 9 point stencils and allows for general couplings between all of the component PDE's and it automatically generates the matrix structures needed to perform the algorithm. The resulting sparse matrix equation is solved by either the preconditioned conjugate gradient (CG) method or by the preconditioned biconjugate gradient (BCG) algorithm. An arbitrary number of component equations are permitted only limited by available memory. In the sub-band representation used, we generate an algorithm that is written compactly in terms of indirect indices which is vectorizable on some of the newer scientific computers.
Lee, Young-Beom; Lee, Jeonghyeon; Tak, Sungho; Lee, Kangjoo; Na, Duk L; Seo, Sang Won; Jeong, Yong; Ye, Jong Chul
2016-01-15
Recent studies of functional connectivity MR imaging have revealed that the default-mode network activity is disrupted in diseases such as Alzheimer's disease (AD). However, there is not yet a consensus on the preferred method for resting-state analysis. Because the brain is reported to have complex interconnected networks according to graph theoretical analysis, the independency assumption, as in the popular independent component analysis (ICA) approach, often does not hold. Here, rather than using the independency assumption, we present a new statistical parameter mapping (SPM)-type analysis method based on a sparse graph model where temporal dynamics at each voxel position are described as a sparse combination of global brain dynamics. In particular, a new concept of a spatially adaptive design matrix has been proposed to represent local connectivity that shares the same temporal dynamics. If we further assume that local network structures within a group are similar, the estimation problem of global and local dynamics can be solved using sparse dictionary learning for the concatenated temporal data across subjects. Moreover, under the homoscedasticity variance assumption across subjects and groups that is often used in SPM analysis, the aforementioned individual and group analyses using sparse dictionary learning can be accurately modeled by a mixed-effect model, which also facilitates a standard SPM-type group-level inference using summary statistics. Using an extensive resting fMRI data set obtained from normal, mild cognitive impairment (MCI), and Alzheimer's disease patient groups, we demonstrated that the changes in the default mode network extracted by the proposed method are more closely correlated with the progression of Alzheimer's disease. Copyright © 2015 Elsevier Inc. All rights reserved.
Two-stage sparse coding of region covariance via Log-Euclidean kernels to detect saliency.
Zhang, Ying-Ying; Yang, Cai; Zhang, Ping
2017-05-01
In this paper, we present a novel bottom-up saliency detection algorithm from the perspective of covariance matrices on a Riemannian manifold. Each superpixel is described by a region covariance matrix on Riemannian Manifolds. We carry out a two-stage sparse coding scheme via Log-Euclidean kernels to extract salient objects efficiently. In the first stage, given background dictionary on image borders, sparse coding of each region covariance via Log-Euclidean kernels is performed. The reconstruction error on the background dictionary is regarded as the initial saliency of each superpixel. In the second stage, an improvement of the initial result is achieved by calculating reconstruction errors of the superpixels on foreground dictionary, which is extracted from the first stage saliency map. The sparse coding in the second stage is similar to the first stage, but is able to effectively highlight the salient objects uniformly from the background. Finally, three post-processing methods-highlight-inhibition function, context-based saliency weighting, and the graph cut-are adopted to further refine the saliency map. Experiments on four public benchmark datasets show that the proposed algorithm outperforms the state-of-the-art methods in terms of precision, recall and mean absolute error, and demonstrate the robustness and efficiency of the proposed method. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhou, Lifan; Chai, Dengfeng; Xia, Yu; Ma, Peifeng; Lin, Hui
2018-01-01
Phase unwrapping (PU) is one of the key processes in reconstructing the digital elevation model of a scene from its interferometric synthetic aperture radar (InSAR) data. It is known that two-dimensional (2-D) PU problems can be formulated as maximum a posteriori estimation of Markov random fields (MRFs). However, considering that the traditional MRF algorithm is usually defined on a rectangular grid, it fails easily if large parts of the wrapped data are dominated by noise caused by large low-coherence area or rapid-topography variation. A PU solution based on sparse MRF is presented to extend the traditional MRF algorithm to deal with sparse data, which allows the unwrapping of InSAR data dominated by high phase noise. To speed up the graph cuts algorithm for sparse MRF, we designed dual elementary graphs and merged them to obtain the Delaunay triangle graph, which is used to minimize the energy function efficiently. The experiments on simulated and real data, compared with other existing algorithms, both confirm the effectiveness of the proposed MRF approach, which suffers less from decorrelation effects caused by large low-coherence area or rapid-topography variation.
Preconditioned conjugate gradient wave-front reconstructors for multiconjugate adaptive optics.
Gilles, Luc; Ellerbroek, Brent L; Vogel, Curtis R
2003-09-10
Multiconjugate adaptive optics (MCAO) systems with 10(4)-10(5) degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of adaptive optics degrees of freedom. We develop scalable open-loop iterative sparse matrix implementations of minimum variance wave-front reconstruction for telescope diameters up to 32 m with more than 10(4) actuators. The basic approach is the preconditioned conjugate gradient method with an efficient preconditioner, whose block structure is defined by the atmospheric turbulent layers very much like the layer-oriented MCAO algorithms of current interest. Two cost-effective preconditioners are investigated: a multigrid solver and a simpler block symmetric Gauss-Seidel (BSGS) sweep. Both options require off-line sparse Cholesky factorizations of the diagonal blocks of the matrix system. The cost to precompute these factors scales approximately as the three-halves power of the number of estimated phase grid points per atmospheric layer, and their average update rate is typically of the order of 10(-2) Hz, i.e., 4-5 orders of magnitude lower than the typical 10(3) Hz temporal sampling rate. All other computations scale almost linearly with the total number of estimated phase grid points. We present numerical simulation results to illustrate algorithm convergence. Convergence rates of both preconditioners are similar, regardless of measurement noise level, indicating that the layer-oriented BSGS sweep is as effective as the more elaborated multiresolution preconditioner.
Thin-film sparse boundary array design for passive acoustic mapping during ultrasound therapy.
Coviello, Christian M; Kozick, Richard J; Hurrell, Andrew; Smith, Penny Probert; Coussios, Constantin-C
2012-10-01
A new 2-D hydrophone array for ultrasound therapy monitoring is presented, along with a novel algorithm for passive acoustic mapping using a sparse weighted aperture. The array is constructed using existing polyvinylidene fluoride (PVDF) ultrasound sensor technology, and is utilized for its broadband characteristics and its high receive sensitivity. For most 2-D arrays, high-resolution imagery is desired, which requires a large aperture at the cost of a large number of elements. The proposed array's geometry is sparse, with elements only on the boundary of the rectangular aperture. The missing information from the interior is filled in using linear imaging techniques. After receiving acoustic emissions during ultrasound therapy, this algorithm applies an apodization to the sparse aperture to limit side lobes and then reconstructs acoustic activity with high spatiotemporal resolution. Experiments show verification of the theoretical point spread function, and cavitation maps in agar phantoms correspond closely to predicted areas, showing the validity of the array and methodology.
Completing sparse and disconnected protein-protein network by deep learning.
Huang, Lei; Liao, Li; Wu, Cathy H
2018-03-22
Protein-protein interaction (PPI) prediction remains a central task in systems biology to achieve a better and holistic understanding of cellular and intracellular processes. Recently, an increasing number of computational methods have shifted from pair-wise prediction to network level prediction. Many of the existing network level methods predict PPIs under the assumption that the training network should be connected. However, this assumption greatly affects the prediction power and limits the application area because the current golden standard PPI networks are usually very sparse and disconnected. Therefore, how to effectively predict PPIs based on a training network that is sparse and disconnected remains a challenge. In this work, we developed a novel PPI prediction method based on deep learning neural network and regularized Laplacian kernel. We use a neural network with an autoencoder-like architecture to implicitly simulate the evolutionary processes of a PPI network. Neurons of the output layer correspond to proteins and are labeled with values (1 for interaction and 0 for otherwise) from the adjacency matrix of a sparse disconnected training PPI network. Unlike autoencoder, neurons at the input layer are given all zero input, reflecting an assumption of no a priori knowledge about PPIs, and hidden layers of smaller sizes mimic ancient interactome at different times during evolution. After the training step, an evolved PPI network whose rows are outputs of the neural network can be obtained. We then predict PPIs by applying the regularized Laplacian kernel to the transition matrix that is built upon the evolved PPI network. The results from cross-validation experiments show that the PPI prediction accuracies for yeast data and human data measured as AUC are increased by up to 8.4 and 14.9% respectively, as compared to the baseline. Moreover, the evolved PPI network can also help us leverage complementary information from the disconnected training network and multiple heterogeneous data sources. Tested by the yeast data with six heterogeneous feature kernels, the results show our method can further improve the prediction performance by up to 2%, which is very close to an upper bound that is obtained by an Approximate Bayesian Computation based sampling method. The proposed evolution deep neural network, coupled with regularized Laplacian kernel, is an effective tool in completing sparse and disconnected PPI networks and in facilitating integration of heterogeneous data sources.
A manual for PARTI runtime primitives, revision 1
NASA Technical Reports Server (NTRS)
Das, Raja; Saltz, Joel; Berryman, Harry
1991-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Big geo data surface approximation using radial basis functions: A comparative study
NASA Astrophysics Data System (ADS)
Majdisova, Zuzana; Skala, Vaclav
2017-12-01
Approximation of scattered data is often a task in many engineering problems. The Radial Basis Function (RBF) approximation is appropriate for big scattered datasets in n-dimensional space. It is a non-separable approximation, as it is based on the distance between two points. This method leads to the solution of an overdetermined linear system of equations. In this paper the RBF approximation methods are briefly described, a new approach to the RBF approximation of big datasets is presented, and a comparison for different Compactly Supported RBFs (CS-RBFs) is made with respect to the accuracy of the computation. The proposed approach uses symmetry of a matrix, partitioning the matrix into blocks and data structures for storage of the sparse matrix. The experiments are performed for synthetic and real datasets.
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics
NASA Astrophysics Data System (ADS)
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-02-01
Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.
Layout optimization with algebraic multigrid methods
NASA Technical Reports Server (NTRS)
Regler, Hans; Ruede, Ulrich
1993-01-01
Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
Unsymmetric ordering using a constrained Markowitz scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amestoy, Patrick R.; Xiaoye S.; Pralet, Stephane
2005-01-18
We present a family of ordering algorithms that can be used as a preprocessing step prior to performing sparse LU factorization. The ordering algorithms simultaneously achieve the objectives of selecting numerically good pivots and preserving the sparsity. We describe the algorithmic properties and challenges in their implementation. By mixing the two objectives we show that we can reduce the amount of fill-in in the factors and reduce the number of numerical problems during factorization. On a set of large unsymmetric real problems, we obtained the median reductions of 12% in the factorization time, of 13% in the size of themore » LU factors, of 20% in the number of operations performed during the factorization phase, and of 11% in the memory needed by the multifrontal solver MA41-UNS. A byproduct of this ordering strategy is an incomplete LU-factored matrix that can be used as a preconditioner in an iterative solver.« less
An in vivo Investigation into Temperature-Controlled Stratification of Sub-Seafloor Populations
NASA Astrophysics Data System (ADS)
McClelland, H. L. O.; Morono, Y.; Fike, D. A.; Bradley, A. S.
2017-12-01
The deep subsurface is characterized by a paucity of carbon substrates and biologically exploitable chemical potential energy. These metabolic challenges can be exacerbated by high temperatures, due to increased costs of cellular maintenance. Though sparse, microbial life persists in such environments, however, the degree to which temperature gradients result in the stratification extremophilic sub-seafloor populations is poorly understood. During Expedition 370, we established a matrix of incubation experiments with sediment samples taken from 8 depths corresponding to in situ temperatures of approximately 37, 50, 60, 70, 80, 90, 100 and 110°C, which were incubated in oxygen-free, acetate- and sulfate- supplemented, artificial seawater at temperatures of 37, 50, 60, 70 and 80°C. Substrates include large isotopic labels. Following separation from the sediment, cells were analyzed using SIMS, allowing estimates of biomass synthesis rates. We are interested in discussing potential future experiments and collaborations using this resource.
Parallelization of the preconditioned IDR solver for modern multicore computer systems
NASA Astrophysics Data System (ADS)
Bessonov, O. A.; Fedoseyev, A. I.
2012-10-01
This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Siren, J; Ovaskainen, O; Merilä, J
2017-10-01
The genetic variance-covariance matrix (G) is a quantity of central importance in evolutionary biology due to its influence on the rate and direction of multivariate evolution. However, the predictive power of empirically estimated G-matrices is limited for two reasons. First, phenotypes are high-dimensional, whereas traditional statistical methods are tuned to estimate and analyse low-dimensional matrices. Second, the stability of G to environmental effects and over time remains poorly understood. Using Bayesian sparse factor analysis (BSFG) designed to estimate high-dimensional G-matrices, we analysed levels variation and covariation in 10,527 expressed genes in a large (n = 563) half-sib breeding design of three-spined sticklebacks subject to two temperature treatments. We found significant differences in the structure of G between the treatments: heritabilities and evolvabilities were higher in the warm than in the low-temperature treatment, suggesting more and faster opportunity to evolve in warm (stressful) conditions. Furthermore, comparison of G and its phenotypic equivalent P revealed the latter is a poor substitute of the former. Most strikingly, the results suggest that the expected impact of G on evolvability-as well as the similarity among G-matrices-may depend strongly on the number of traits included into analyses. In our results, the inclusion of only few traits in the analyses leads to underestimation in the differences between the G-matrices and their predicted impacts on evolution. While the results highlight the challenges involved in estimating G, they also illustrate that by enabling the estimation of large G-matrices, the BSFG method can improve predicted evolutionary responses to selection. © 2017 John Wiley & Sons Ltd.
Niksirat, Hamid; Kouba, Antonín
2016-04-01
The freshly ejaculated spermatophore of crayfish undergoes a hardening process during post-mating storage on the body surface of female. The ultrastructural distribution of calcium deposits were studied and compared in freshly ejaculated and post-mating noble crayfish spermatophores, using the oxalate-pyroantimonate technique, to determine possible roles of calcium in post-mating spermatophore hardening and spermatozoon maturation. Small particles of sparsely distributed calcium deposits were visible in the wall of freshly ejaculated spermatophore. Also, large amount of calcium deposits were visible in the membranes of the freshly ejaculated spermatozoon. Five minutes post-ejaculation, granules in the spermatophore wall appeared as porous formations with numerous electron lucent spaces. Calcium deposits were visible within the spaces and scattered in the spermatophore wall matrix, where smaller calcium deposits combined to form globular calcium deposits. Large numbers of the globular calcium deposits were visible in the wall of the post-mating spermatophore. Smaller calcium deposits were detected in the central area of post-mating spermatophore, which contains the sperm mass, and in the extracellular matrix and capsule. While the density of calcium deposits decreased in the post-mating spermatozoon membranes, numerous small calcium deposits appeared in the subacrosomal zone and nucleus. Substantial changes in calcium deposit distribution in the crayfish spermatophore during post-mating storage on the body of female may be involved in the processes of the spermatophore hardening and spermatozoon maturation. © 2016 Wiley Periodicals, Inc.
Strahl, Stefan; Mertins, Alfred
2008-07-18
Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.
High Angular Resolution Microwave Sensing with Large, Sparse, Random Arrays
1983-11-01
RESEARCH AFOSR 82-0012 DTIC s" A6 19M UNIVERSITY of PENNSYLVANIA VALLEY FORGE RESEARCH CENTER THE MOORE SCHOOL OF ELECTRICAL ENGINEERING PHILADELPHIA...MICROWAVE SENSING WITH LARGE, SPARSE, RANDOM ARRAYS Final Scientific Report AIR FORCE OFFICE OF SCIENTIFIC RESEARCH AFOSR 82-0012 Valley Forge Research ...CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE Air Force Office of Scientific Research /NE Nov 1983 - . Bildin 41073. NUMBER Or PAG ES BOllinZ AFB, DIC
"Slow-scanning" in Ground-based Mid-infrared Observations
NASA Astrophysics Data System (ADS)
Ohsawa, Ryou; Sako, Shigeyuki; Miyata, Takashi; Kamizuka, Takafumi; Okada, Kazushi; Mori, Kiyoshi; Uchiyama, Masahito S.; Yamaguchi, Junpei; Fujiyoshi, Takuya; Morii, Mikio; Ikeda, Shiro
2018-04-01
Chopping observations with a tip-tilt secondary mirror have conventionally been used in ground-based mid-infrared observations. However, it is not practical for next generation large telescopes to have a large tip-tilt mirror that moves at a frequency larger than a few hertz. We propose an alternative observing method, a "slow-scanning" observation. Images are continuously captured as movie data, while the field of view is slowly moved. The signal from an astronomical object is extracted from the movie data by a low-rank and sparse matrix decomposition. The performance of the "slow-scanning" observation was tested in an experimental observation with Subaru/COMICS. The quality of a resultant image in the "slow-scanning" observation was as good as in a conventional chopping observation with COMICS, at least for a bright point-source object. The observational efficiency in the "slow-scanning" observation was better than that in the chopping observation. The results suggest that the "slow-scanning" observation can be a competitive method for the Subaru telescope and be of potential interest to other ground-based facilities to avoid chopping.
Simulation of Quantum Many-Body Dynamics for Generic Strongly-Interacting Systems
NASA Astrophysics Data System (ADS)
Meyer, Gregory; Machado, Francisco; Yao, Norman
2017-04-01
Recent experimental advances have enabled the bottom-up assembly of complex, strongly interacting quantum many-body systems from individual atoms, ions, molecules and photons. These advances open the door to studying dynamics in isolated quantum systems as well as the possibility of realizing novel out-of-equilibrium phases of matter. Numerical studies provide insight into these systems; however, computational time and memory usage limit common numerical methods such as exact diagonalization to relatively small Hilbert spaces of dimension 215 . Here we present progress toward a new software package for dynamical time evolution of large generic quantum systems on massively parallel computing architectures. By projecting large sparse Hamiltonians into a much smaller Krylov subspace, we are able to compute the evolution of strongly interacting systems with Hilbert space dimension nearing 230. We discuss and benchmark different design implementations, such as matrix-free methods and GPU based calculations, using both pre-thermal time crystals and the Sachdev-Ye-Kitaev model as examples. We also include a simple symbolic language to describe generic Hamiltonians, allowing simulation of diverse quantum systems without any modification of the underlying C and Fortran code.
Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, Jack J.; Tomov, Stanimire
2014-03-24
The goal of the MAGMA project is to create a new generation of linear algebra libraries that achieve the fastest possible time to an accurate solution on hybrid Multicore+GPU-based systems, using all the processing power that future high-end systems can make available within given energy constraints. Our efforts at the University of Tennessee achieved the goals set in all of the five areas identified in the proposal: 1. Communication optimal algorithms; 2. Autotuning for GPU and hybrid processors; 3. Scheduling and memory management techniques for heterogeneity and scale; 4. Fault tolerance and robustness for large scale systems; 5. Building energymore » efficiency into software foundations. The University of Tennessee’s main contributions, as proposed, were the research and software development of new algorithms for hybrid multi/many-core CPUs and GPUs, as related to two-sided factorizations and complete eigenproblem solvers, hybrid BLAS, and energy efficiency for dense, as well as sparse, operations. Furthermore, as proposed, we investigated and experimented with various techniques targeting the five main areas outlined.« less
Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias
2014-06-10
We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
Exhaustive Search for Sparse Variable Selection in Linear Regression
NASA Astrophysics Data System (ADS)
Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato
2018-04-01
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Inference for High-dimensional Differential Correlation Matrices *
Cai, T. Tony; Zhang, Anru
2015-01-01
Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed. PMID:26500380
Fast super-resolution estimation of DOA and DOD in bistatic MIMO Radar with off-grid targets
NASA Astrophysics Data System (ADS)
Zhang, Dong; Zhang, Yongshun; Zheng, Guimei; Feng, Cunqian; Tang, Jun
2018-05-01
In this paper, we focus on the problem of joint DOA and DOD estimation in Bistatic MIMO Radar using sparse reconstruction method. In traditional ways, we usually convert the 2D parameter estimation problem into 1D parameter estimation problem by Kronecker product which will enlarge the scale of the parameter estimation problem and bring more computational burden. Furthermore, it requires that the targets must fall on the predefined grids. In this paper, a 2D-off-grid model is built which can solve the grid mismatch problem of 2D parameters estimation. Then in order to solve the joint 2D sparse reconstruction problem directly and efficiently, three kinds of fast joint sparse matrix reconstruction methods are proposed which are Joint-2D-OMP algorithm, Joint-2D-SL0 algorithm and Joint-2D-SOONE algorithm. Simulation results demonstrate that our methods not only can improve the 2D parameter estimation accuracy but also reduce the computational complexity compared with the traditional Kronecker Compressed Sensing method.
BI-sparsity pursuit for robust subspace recovery
Bian, Xiao; Krim, Hamid
2015-09-01
Here, the success of sparse models in computer vision and machine learning in many real-world applications, may be attributed in large part, to the fact that many high dimensional data are distributed in a union of low dimensional subspaces. The underlying structure may, however, be adversely affected by sparse errors, thus inducing additional complexity in recovering it. In this paper, we propose a bi-sparse model as a framework to investigate and analyze this problem, and provide as a result , a novel algorithm to recover the union of subspaces in presence of sparse corruptions. We additionally demonstrate the effectiveness ofmore » our method by experiments on real-world vision data.« less
Blind source separation by sparse decomposition
NASA Astrophysics Data System (ADS)
Zibulevsky, Michael; Pearlmutter, Barak A.
2000-04-01
The blind source separation problem is to extract the underlying source signals from a set of their linear mixtures, where the mixing matrix is unknown. This situation is common, eg in acoustics, radio, and medical signal processing. We exploit the property of the sources to have a sparse representation in a corresponding signal dictionary. Such a dictionary may consist of wavelets, wavelet packets, etc., or be obtained by learning from a given family of signals. Starting from the maximum a posteriori framework, which is applicable to the case of more sources than mixtures, we derive a few other categories of objective functions, which provide faster and more robust computations, when there are an equal number of sources and mixtures. Our experiments with artificial signals and with musical sounds demonstrate significantly better separation than other known techniques.
Kosik, Ivan; Raess, Avery
2015-01-01
Accurate reconstruction of 3D photoacoustic (PA) images requires detection of photoacoustic signals from many angles. Several groups have adopted staring ultrasound arrays, but assessment of array performance has been limited. We previously reported on a method to calibrate a 3D PA tomography (PAT) staring array system and analyze system performance using singular value decomposition (SVD). The developed SVD metric, however, was impractical for large system matrices, which are typical of 3D PAT problems. The present study consisted of two main objectives. The first objective aimed to introduce the crosstalk matrix concept to the field of PAT for system design. Figures-of-merit utilized in this study were root mean square error, peak signal-to-noise ratio, mean absolute error, and a three dimensional structural similarity index, which were derived between the normalized spatial crosstalk matrix and the identity matrix. The applicability of this approach for 3D PAT was validated by observing the response of the figures-of-merit in relation to well-understood PAT sampling characteristics (i.e. spatial and temporal sampling rate). The second objective aimed to utilize the figures-of-merit to characterize and improve the performance of a near-spherical staring array design. Transducer arrangement, array radius, and array angular coverage were the design parameters examined. We observed that the performance of a 129-element staring transducer array for 3D PAT could be improved by selection of optimal values of the design parameters. The results suggested that this formulation could be used to objectively characterize 3D PAT system performance and would enable the development of efficient strategies for system design optimization. PMID:25875177
3D airborne EM modeling based on the spectral-element time-domain (SETD) method
NASA Astrophysics Data System (ADS)
Cao, X.; Yin, C.; Huang, X.; Liu, Y.; Zhang, B., Sr.; Cai, J.; Liu, L.
2017-12-01
In the field of 3D airborne electromagnetic (AEM) modeling, both finite-difference time-domain (FDTD) method and finite-element time-domain (FETD) method have limitations that FDTD method depends too much on the grids and time steps, while FETD requires large number of grids for complex structures. We propose a time-domain spectral-element (SETD) method based on GLL interpolation basis functions for spatial discretization and Backward Euler (BE) technique for time discretization. The spectral-element method is based on a weighted residual technique with polynomials as vector basis functions. It can contribute to an accurate result by increasing the order of polynomials and suppressing spurious solution. BE method is a stable tine discretization technique that has no limitation on time steps and can guarantee a higher accuracy during the iteration process. To minimize the non-zero number of sparse matrix and obtain a diagonal mass matrix, we apply the reduced order integral technique. A direct solver with its speed independent of the condition number is adopted for quickly solving the large-scale sparse linear equations system. To check the accuracy of our SETD algorithm, we compare our results with semi-analytical solutions for a three-layered earth model within the time lapse 10-6-10-2s for different physical meshes and SE orders. The results show that the relative errors for magnetic field B and magnetic induction are both around 3-5%. Further we calculate AEM responses for an AEM system over a 3D earth model in Figure 1. From numerical experiments for both 1D and 3D model, we draw the conclusions that: 1) SETD can deliver an accurate results for both dB/dt and B; 2) increasing SE order improves the modeling accuracy for early to middle time channels when the EM field diffuses fast so the high-order SE can model the detailed variation; 3) at very late time channels, increasing SE order has little improvement on modeling accuracy, but the time interval plays important roles. This research is supported by Key Program of National Natural Science Foundation of China (41530320), China Natural Science Foundation for Young Scientists (41404093), and Key National Research Project of China (2016YFC0303100, 2017YFC0601900). Figure 1: (a) AEM system over a 3D earth model; (b) magnetic field Bz; (c) magnetic induction dBz/dt.
NASA Astrophysics Data System (ADS)
Li, Jun; Song, Minghui; Peng, Yuanxi
2018-03-01
Current infrared and visible image fusion methods do not achieve adequate information extraction, i.e., they cannot extract the target information from infrared images while retaining the background information from visible images. Moreover, most of them have high complexity and are time-consuming. This paper proposes an efficient image fusion framework for infrared and visible images on the basis of robust principal component analysis (RPCA) and compressed sensing (CS). The novel framework consists of three phases. First, RPCA decomposition is applied to the infrared and visible images to obtain their sparse and low-rank components, which represent the salient features and background information of the images, respectively. Second, the sparse and low-rank coefficients are fused by different strategies. On the one hand, the measurements of the sparse coefficients are obtained by the random Gaussian matrix, and they are then fused by the standard deviation (SD) based fusion rule. Next, the fused sparse component is obtained by reconstructing the result of the fused measurement using the fast continuous linearized augmented Lagrangian algorithm (FCLALM). On the other hand, the low-rank coefficients are fused using the max-absolute rule. Subsequently, the fused image is superposed by the fused sparse and low-rank components. For comparison, several popular fusion algorithms are tested experimentally. By comparing the fused results subjectively and objectively, we find that the proposed framework can extract the infrared targets while retaining the background information in the visible images. Thus, it exhibits state-of-the-art performance in terms of both fusion effects and timeliness.
Voltage stability analysis in the new deregulated environment
NASA Astrophysics Data System (ADS)
Zhu, Tong
Nowadays, a significant portion of the power industry is under deregulation. Under this new circumstance, network security analysis is more critical and more difficult. One of the most important issues in network security analysis is voltage stability analysis. Due to the expected higher utilization of equipment induced by competition in a power market that covers bigger power systems, this issue is increasingly acute after deregulation. In this dissertation, some selected topics of voltage stability analysis are covered. In the first part, after a brief review of general concepts of continuation power flow (CPF), investigations on various matrix analysis techniques to improve the speed of CPF calculation for large systems are reported. Based on these improvements, a new CPF algorithm is proposed. This new method is then tested by an inter-area transaction in a large inter-connected power system. In the second part, the Arnoldi algorithm, the best method to find a few minimum singular values for a large sparse matrix, is introduced into the modal analysis for the first time. This new modal analysis is applied to the estimation of the point of voltage collapse and contingency evaluation in voltage security assessment. Simulations show that the new method is very efficient. In the third part, after transient voltage stability component models are investigated systematically, a novel system model for transient voltage stability analysis, which is a logical-algebraic-differential-difference equation (LADDE), is offered. As an example, TCSC (Thyristor controlled series capacitors) is addressed as a transient voltage stabilizing controller. After a TCSC transient voltage stability model is outlined, a new TCSC controller is proposed to enhance both fault related and load increasing related transient voltage stability. Its ability is proven by the simulation.
Sparse reconstruction localization of multiple acoustic emissions in large diameter pipelines
NASA Astrophysics Data System (ADS)
Dubuc, Brennan; Ebrahimkhanlou, Arvin; Salamone, Salvatore
2017-04-01
A sparse reconstruction localization method is proposed, which is capable of localizing multiple acoustic emission events occurring closely in time. The events may be due to a number of sources, such as the growth of corrosion patches or cracks. Such acoustic emissions may yield localization failure if a triangulation method is used. The proposed method is implemented both theoretically and experimentally on large diameter thin-walled pipes. Experimental examples are presented, which demonstrate the failure of a triangulation method when multiple sources are present in this structure, while highlighting the capabilities of the proposed method. The examples are generated from experimental data of simulated acoustic emission events. The data corresponds to helical guided ultrasonic waves generated in a 3 m long large diameter pipe by pencil lead breaks on its outer surface. Acoustic emission waveforms are recorded by six sparsely distributed low-profile piezoelectric transducers instrumented on the outer surface of the pipe. The same array of transducers is used for both the proposed and the triangulation method. It is demonstrated that the proposed method is able to localize multiple events occurring closely in time. Furthermore, the matching pursuit algorithm and the basis pursuit densoising approach are each evaluated as potential numerical tools in the proposed sparse reconstruction method.
Methods for design and evaluation of integrated hardware-software systems for concurrent computation
NASA Technical Reports Server (NTRS)
Pratt, T. W.
1985-01-01
Research activities and publications are briefly summarized. The major tasks reviewed are: (1) VAX implementation of the PISCES parallel programming environment; (2) Apollo workstation network implementation of the PISCES environment; (3) FLEX implementation of the PISCES environment; (4) sparse matrix iterative solver in PSICES Fortran; (5) image processing application of PISCES; and (6) a formal model of concurrent computation being developed.
Research on Synthesis of Concurrent Computing Systems.
1982-09-01
20 1.5.1 An Informal Description of the Techniques ....... ..................... 20 1.5 2 Formal Definitions of Aggregation and Virtualisation ...sparsely interconnected networks . We have also developed techniques to create Kung’s systolic array parallel structure from a specification of matrix...resufts of the computation of that element. For example, if A,j is computed using a single enumeration, then virtualisation would produce a three
Feature Selection and Pedestrian Detection Based on Sparse Representation.
Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei
2015-01-01
Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.
Accelerating scientific computations with mixed precision algorithms
NASA Astrophysics Data System (ADS)
Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire
2009-12-01
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
Evidence for sparse synergies in grasping actions.
Prevete, Roberto; Donnarumma, Francesco; d'Avella, Andrea; Pezzulo, Giovanni
2018-01-12
Converging evidence shows that hand-actions are controlled at the level of synergies and not single muscles. One intriguing aspect of synergy-based action-representation is that it may be intrinsically sparse and the same synergies can be shared across several distinct types of hand-actions. Here, adopting a normative angle, we consider three hypotheses for hand-action optimal-control: sparse-combination hypothesis (SC) - sparsity in the mapping between synergies and actions - i.e., actions implemented using a sparse combination of synergies; sparse-elements hypothesis (SE) - sparsity in synergy representation - i.e., the mapping between degrees-of-freedom (DoF) and synergies is sparse; double-sparsity hypothesis (DS) - a novel view combining both SC and SE - i.e., both the mapping between DoF and synergies and between synergies and actions are sparse, each action implementing a sparse combination of synergies (as in SC), each using a limited set of DoFs (as in SE). We evaluate these hypotheses using hand kinematic data from six human subjects performing nine different types of reach-to-grasp actions. Our results support DS, suggesting that the best action representation is based on a relatively large set of synergies, each involving a reduced number of degrees-of-freedom, and that distinct sets of synergies may be involved in distinct tasks.
NASA Technical Reports Server (NTRS)
Maliassov, Serguei
1996-01-01
In this paper an algebraic substructuring preconditioner is considered for nonconforming finite element approximations of second order elliptic problems in 3D domains with a piecewise constant diffusion coefficient. Using a substructuring idea and a block Gauss elimination, part of the unknowns is eliminated and the Schur complement obtained is preconditioned by a spectrally equivalent very sparse matrix. In the case of quasiuniform tetrahedral mesh an appropriate algebraic multigrid solver can be used to solve the problem with this matrix. Explicit estimates of condition numbers and implementation algorithms are established for the constructed preconditioner. It is shown that the condition number of the preconditioned matrix does not depend on either the mesh step size or the jump of the coefficient. Finally, numerical experiments are presented to illustrate the theory being developed.
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering.
Sicat, Ronell; Krüger, Jens; Möller, Torsten; Hadwiger, Markus
2014-12-01
This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs.
Zhang, Ying-Ying; Yang, Cai; Zhang, Ping
2017-08-01
In this paper, we present a novel bottom-up saliency detection algorithm from the perspective of covariance matrices on a Riemannian manifold. Each superpixel is described by a region covariance matrix on Riemannian Manifolds. We carry out a two-stage sparse coding scheme via Log-Euclidean kernels to extract salient objects efficiently. In the first stage, given background dictionary on image borders, sparse coding of each region covariance via Log-Euclidean kernels is performed. The reconstruction error on the background dictionary is regarded as the initial saliency of each superpixel. In the second stage, an improvement of the initial result is achieved by calculating reconstruction errors of the superpixels on foreground dictionary, which is extracted from the first stage saliency map. The sparse coding in the second stage is similar to the first stage, but is able to effectively highlight the salient objects uniformly from the background. Finally, three post-processing methods-highlight-inhibition function, context-based saliency weighting, and the graph cut-are adopted to further refine the saliency map. Experiments on four public benchmark datasets show that the proposed algorithm outperforms the state-of-the-art methods in terms of precision, recall and mean absolute error, and demonstrate the robustness and efficiency of the proposed method. Copyright © 2017 Elsevier Ltd. All rights reserved.
Spectrum recovery method based on sparse representation for segmented multi-Gaussian model
NASA Astrophysics Data System (ADS)
Teng, Yidan; Zhang, Ye; Ti, Chunli; Su, Nan
2016-09-01
Hyperspectral images can realize crackajack features discriminability for supplying diagnostic characteristics with high spectral resolution. However, various degradations may generate negative influence on the spectral information, including water absorption, bands-continuous noise. On the other hand, the huge data volume and strong redundancy among spectrums produced intense demand on compressing HSIs in spectral dimension, which also leads to the loss of spectral information. The reconstruction of spectral diagnostic characteristics has irreplaceable significance for the subsequent application of HSIs. This paper introduces a spectrum restoration method for HSIs making use of segmented multi-Gaussian model (SMGM) and sparse representation. A SMGM is established to indicating the unsymmetrical spectral absorption and reflection characteristics, meanwhile, its rationality and sparse property are discussed. With the application of compressed sensing (CS) theory, we implement sparse representation to the SMGM. Then, the degraded and compressed HSIs can be reconstructed utilizing the uninjured or key bands. Finally, we take low rank matrix recovery (LRMR) algorithm for post processing to restore the spatial details. The proposed method was tested on the spectral data captured on the ground with artificial water absorption condition and an AVIRIS-HSI data set. The experimental results in terms of qualitative and quantitative assessments demonstrate that the effectiveness on recovering the spectral information from both degradations and loss compression. The spectral diagnostic characteristics and the spatial geometry feature are well preserved.
Sample-Starved Large Scale Network Analysis
2016-05-05
As reported in our journal publication (G. Marjanovic and A. O. Hero, ”l0 Sparse Inverse Covariance Estimation,” IEEE Trans on Signal Processing, vol... Marjanovic and A. O. Hero, ”l0 Sparse Inverse Covariance Estimation,” in IEEE Trans on Signal Processing, vol. 63, no. 12, pp. 3218-3231, May 2015. 6. G
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carleton, James Brian; Parks, Michael L.
Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Incomplete Sparse Approximate Inverses for Parallel Preconditioning
Anzt, Hartwig; Huckle, Thomas K.; Bräckle, Jürgen; ...
2017-10-28
In this study, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. Finally, in a study covering a large number of matrices, we identify the ISAI preconditioner as anmore » attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.« less
An analysis of spectral envelope-reduction via quadratic assignment problems
NASA Technical Reports Server (NTRS)
George, Alan; Pothen, Alex
1994-01-01
A new spectral algorithm for reordering a sparse symmetric matrix to reduce its envelope size was described. The ordering is computed by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. In this paper, we provide an analysis of the spectral envelope reduction algorithm. We described related 1- and 2-sum problems; the former is related to the envelope size, while the latter is related to an upper bound on the work involved in an envelope Cholesky factorization scheme. We formulate the latter two problems as quadratic assignment problems, and then study the 2-sum problem in more detail. We obtain lower bounds on the 2-sum by considering a projected quadratic assignment problem, and then show that finding a permutation matrix closest to an orthogonal matrix attaining one of the lower bounds justifies the spectral envelope reduction algorithm. The lower bound on the 2-sum is seen to be tight for reasonably 'uniform' finite element meshes. We also obtain asymptotically tight lower bounds for the envelope size for certain classes of meshes.
Unifying model for random matrix theory in arbitrary space dimensions
NASA Astrophysics Data System (ADS)
Cicuta, Giovanni M.; Krausser, Johannes; Milkus, Rico; Zaccone, Alessio
2018-03-01
A sparse random block matrix model suggested by the Hessian matrix used in the study of elastic vibrational modes of amorphous solids is presented and analyzed. By evaluating some moments, benchmarked against numerics, differences in the eigenvalue spectrum of this model in different limits of space dimension d , and for arbitrary values of the lattice coordination number Z , are shown and discussed. As a function of these two parameters (and their ratio Z /d ), the most studied models in random matrix theory (Erdos-Renyi graphs, effective medium, and replicas) can be reproduced in the various limits of block dimensionality d . Remarkably, the Marchenko-Pastur spectral density (which is recovered by replica calculations for the Laplacian matrix) is reproduced exactly in the limit of infinite size of the blocks, or d →∞ , which clarifies the physical meaning of space dimension in these models. We feel that the approximate results for d =3 provided by our method may have many potential applications in the future, from the vibrational spectrum of glasses and elastic networks to wave localization, disordered conductors, random resistor networks, and random walks.
NASA Technical Reports Server (NTRS)
Gezari, D.; Lyon, R.; Woodruff, R.; Labeyrie, A.; Oegerle, William (Technical Monitor)
2002-01-01
A concept is presented for a large (10 - 30 meter) sparse aperture hyper telescope to image extrasolar earth-like planets from the ground in the presence of atmospheric seeing. The telescope achieves high dynamic range very close to bright stellar sources with good image quality using pupil densification techniques. Active correction of the perturbed wavefront is simplified by using 36 small flat mirrors arranged in a parabolic steerable array structure, eliminating the need for large delat lines and operating at near-infrared (1 - 3 Micron) wavelengths with flats comparable in size to the seeing cells.
DNA melting profiles from a matrix method.
Poland, Douglas
2004-02-05
In this article we give a new method for the calculation of DNA melting profiles. Based on the matrix formulation of the DNA partition function, the method relies for its efficiency on the fact that the required matrices are very sparse, essentially reducing matrix multiplication to vector multiplication and thus making the computer time required to treat a DNA molecule containing N base pairs proportional to N(2). A key ingredient in the method is the result that multiplication by the inverse matrix can also be reduced to vector multiplication. The task of calculating the melting profile for the entire genome is further reduced by treating regions of the molecule between helix-plateaus, thus breaking the molecule up into independent parts that can each be treated individually. The method is easily modified to incorporate changes in the assignment of statistical weights to the different structural features of DNA. We illustrate the method using the genome of Haemophilus influenzae. Copyright 2003 Wiley Periodicals, Inc.
Tawfic, Israa Shaker; Kayhan, Sema Koc
2017-02-01
Compressed sensing (CS) is a new field used for signal acquisition and design of sensor that made a large drooping in the cost of acquiring sparse signals. In this paper, new algorithms are developed to improve the performance of the greedy algorithms. In this paper, a new greedy pursuit algorithm, SS-MSMP (Split Signal for Multiple Support of Matching Pursuit), is introduced and theoretical analyses are given. The SS-MSMP is suggested for sparse data acquisition, in order to reconstruct analog and efficient signals via a small set of general measurements. This paper proposes a new fast method which depends on a study of the behavior of the support indices through picking the best estimation of the corrosion between residual and measurement matrix. The term multiple supports originates from an algorithm; in each iteration, the best support indices are picked based on maximum quality created by discovering correlation for a particular length of support. We depend on this new algorithm upon our previous derivative of halting condition that we produce for Least Support Orthogonal Matching Pursuit (LS-OMP) for clear and noisy signal. For better reconstructed results, SS-MSMP algorithm provides the recovery of support set for long signals such as signals used in WBAN. Numerical experiments demonstrate that the new suggested algorithm performs well compared to existing algorithms in terms of many factors used for reconstruction performance. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Effective matrix diffusion in kilometer‐scale transport in fractured crystalline rock
Shapiro, Allen M.
2001-01-01
Concentrations of tritium (3H) and dichlorodifluoromethane (CFC‐12) in water samples taken from glacial drift and fractured crystalline rock over 4 km2 in central New Hampshire are interpreted to identify a conceptual model of matrix diffusion and the magnitude of the diffusion coefficient. Dispersion and mass transfer to and from fractures has affected the 3H concentration to the extent that the peak 3H concentration of the 1960s is no longer distinguishable. Because of heterogeneity in the bedrock the sparsely distributed chemical data do not warrant a three‐dimensional transport model. Instead, a one‐dimensional model of CFC‐12 and 3H migration along flow lines in the glacial drift and bedrock is used to place bounds on the processes affecting kilometer‐scale transport, arid model parameters are varied to reproduce the measured relation between 3H and CFC‐12, rather than their spatial distributions. A model of mass exchange to and from fractures that is dependent on the time‐varying concentration gradient at fracture surfaces qualitatively reproduces the measured relation between 3H and CFC‐12 with an upper bound for the fracture dispersivity approximately equal to 250 m and a lower bound for the effective matrix diffusion coefficient equal to 1 m2 yr−1. The diffusion coefficient at the kilometer scale is at least 3 orders of magnitude greater than laboratory estimates of diffusion in crystalline rock. The large diffusion coefficient indicates that diffusion into an immobile fluid phase (rock matrix) is masked at the kilometer scale by advective mass exchange between fractures with large contrasts in trarismissivity. The measured transmissivity of fractures in the study area varies over more than 6 orders of magnitude. Advective mass exchange from high‐permeability fractures to low‐permeability fractures results in short migration distances of a chemical constituent in low‐permeability fractures over an extended period of time before reentering high‐permeability fractures; viewed at the kilometer scale, this process is analogous to the chemical constituent diffusing into and out of an immobile fluid phase.
A modified sparse reconstruction method for three-dimensional synthetic aperture radar image
NASA Astrophysics Data System (ADS)
Zhang, Ziqiang; Ji, Kefeng; Song, Haibo; Zou, Huanxin
2018-03-01
There is an increasing interest in three-dimensional Synthetic Aperture Radar (3-D SAR) imaging from observed sparse scattering data. However, the existing 3-D sparse imaging method requires large computing times and storage capacity. In this paper, we propose a modified method for the sparse 3-D SAR imaging. The method processes the collection of noisy SAR measurements, usually collected over nonlinear flight paths, and outputs 3-D SAR imagery. Firstly, the 3-D sparse reconstruction problem is transformed into a series of 2-D slices reconstruction problem by range compression. Then the slices are reconstructed by the modified SL0 (smoothed l0 norm) reconstruction algorithm. The improved algorithm uses hyperbolic tangent function instead of the Gaussian function to approximate the l0 norm and uses the Newton direction instead of the steepest descent direction, which can speed up the convergence rate of the SL0 algorithm. Finally, numerical simulation results are given to demonstrate the effectiveness of the proposed algorithm. It is shown that our method, compared with existing 3-D sparse imaging method, performs better in reconstruction quality and the reconstruction time.
Numerical methods in Markov chain modeling
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef; Stewart, William J.
1989-01-01
Several methods for computing stationary probability distributions of Markov chains are described and compared. The main linear algebra problem consists of computing an eigenvector of a sparse, usually nonsymmetric, matrix associated with a known eigenvalue. It can also be cast as a problem of solving a homogeneous singular linear system. Several methods based on combinations of Krylov subspace techniques are presented. The performance of these methods on some realistic problems are compared.
Recursive inverse factorization.
Rubensson, Emanuel H; Bock, Nicolas; Holmström, Erik; Niklasson, Anders M N
2008-03-14
A recursive algorithm for the inverse factorization S(-1)=ZZ(*) of Hermitian positive definite matrices S is proposed. The inverse factorization is based on iterative refinement [A.M.N. Niklasson, Phys. Rev. B 70, 193102 (2004)] combined with a recursive decomposition of S. As the computational kernel is matrix-matrix multiplication, the algorithm can be parallelized and the computational effort increases linearly with system size for systems with sufficiently sparse matrices. Recent advances in network theory are used to find appropriate recursive decompositions. We show that optimization of the so-called network modularity results in an improved partitioning compared to other approaches. In particular, when the recursive inverse factorization is applied to overlap matrices of irregularly structured three-dimensional molecules.
On the efficiency of a randomized mirror descent algorithm in online optimization problems
NASA Astrophysics Data System (ADS)
Gasnikov, A. V.; Nesterov, Yu. E.; Spokoiny, V. G.
2015-04-01
A randomized online version of the mirror descent method is proposed. It differs from the existing versions by the randomization method. Randomization is performed at the stage of the projection of a subgradient of the function being optimized onto the unit simplex rather than at the stage of the computation of a subgradient, which is common practice. As a result, a componentwise subgradient descent with a randomly chosen component is obtained, which admits an online interpretation. This observation, for example, has made it possible to uniformly interpret results on weighting expert decisions and propose the most efficient method for searching for an equilibrium in a zero-sum two-person matrix game with sparse matrix.
Partitioning sparse matrices with eigenvectors of graphs
NASA Technical Reports Server (NTRS)
Pothen, Alex; Simon, Horst D.; Liou, Kang-Pu
1990-01-01
The problem of computing a small vertex separator in a graph arises in the context of computing a good ordering for the parallel factorization of sparse, symmetric matrices. An algebraic approach for computing vertex separators is considered in this paper. It is shown that lower bounds on separator sizes can be obtained in terms of the eigenvalues of the Laplacian matrix associated with a graph. The Laplacian eigenvectors of grid graphs can be computed from Kronecker products involving the eigenvectors of path graphs, and these eigenvectors can be used to compute good separators in grid graphs. A heuristic algorithm is designed to compute a vertex separator in a general graph by first computing an edge separator in the graph from an eigenvector of the Laplacian matrix, and then using a maximum matching in a subgraph to compute the vertex separator. Results on the quality of the separators computed by the spectral algorithm are presented, and these are compared with separators obtained from other algorithms for computing separators. Finally, the time required to compute the Laplacian eigenvector is reported, and the accuracy with which the eigenvector must be computed to obtain good separators is considered. The spectral algorithm has the advantage that it can be implemented on a medium-size multiprocessor in a straightforward manner.
NASA Technical Reports Server (NTRS)
Kanerva, P.
1986-01-01
To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
Comparison between sparsely distributed memory and Hopfield-type neural network models
NASA Technical Reports Server (NTRS)
Keeler, James D.
1986-01-01
The Sparsely Distributed Memory (SDM) model (Kanerva, 1984) is compared to Hopfield-type neural-network models. A mathematical framework for comparing the two is developed, and the capacity of each model is investigated. The capacity of the SDM can be increased independently of the dimension of the stored vectors, whereas the Hopfield capacity is limited to a fraction of this dimension. However, the total number of stored bits per matrix element is the same in the two models, as well as for extended models with higher order interactions. The models are also compared in their ability to store sequences of patterns. The SDM is extended to include time delays so that contextual information can be used to cover sequences. Finally, it is shown how a generalization of the SDM allows storage of correlated input pattern vectors.
Ma, Xu; Cheng, Yongmei; Hao, Shuai
2016-12-10
Automatic classification of terrain surfaces from an aerial image is essential for an autonomous unmanned aerial vehicle (UAV) landing at an unprepared site by using vision. Diverse terrain surfaces may show similar spectral properties due to the illumination and noise that easily cause poor classification performance. To address this issue, a multi-stage classification algorithm based on low-rank recovery and multi-feature fusion sparse representation is proposed. First, color moments and Gabor texture feature are extracted from training data and stacked as column vectors of a dictionary. Then we perform low-rank matrix recovery for the dictionary by using augmented Lagrange multipliers and construct a multi-stage terrain classifier. Experimental results on an aerial map database that we prepared verify the classification accuracy and robustness of the proposed method.
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Heber, Gerd; Biswas, Rupak
2000-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
An exact formulation of the time-ordered exponential using path-sums
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giscard, P.-L., E-mail: p.giscard1@physics.ox.ac.uk; Lui, K.; Thwaite, S. J.
2015-05-15
We present the path-sum formulation for the time-ordered exponential of a time-dependent matrix. The path-sum formulation gives the time-ordered exponential as a branched continued fraction of finite depth and breadth. The terms of the path-sum have an elementary interpretation as self-avoiding walks and self-avoiding polygons on a graph. Our result is based on a representation of the time-ordered exponential as the inverse of an operator, the mapping of this inverse to sums of walks on a graphs, and the algebraic structure of sets of walks. We give examples demonstrating our approach. We establish a super-exponential decay bound for the magnitudemore » of the entries of the time-ordered exponential of sparse matrices. We give explicit results for matrices with commonly encountered sparse structures.« less
USDA-ARS?s Scientific Manuscript database
It is challenging to achieve rapid and accurate processing of large amounts of hyperspectral image data. This research was aimed to develop a novel classification method by employing deep feature representation with the stacked sparse auto-encoder (SSAE) and the SSAE combined with convolutional neur...
Zhang, Guoqing; Sun, Huaijiang; Xia, Guiyu; Sun, Quansen
2016-07-07
Sparse representation based classification (SRC) has been developed and shown great potential for real-world application. Based on SRC, Yang et al. [10] devised a SRC steered discriminative projection (SRC-DP) method. However, as a linear algorithm, SRC-DP cannot handle the data with highly nonlinear distribution. Kernel sparse representation-based classifier (KSRC) is a non-linear extension of SRC and can remedy the drawback of SRC. KSRC requires the use of a predetermined kernel function and selection of the kernel function and its parameters is difficult. Recently, multiple kernel learning for SRC (MKL-SRC) [22] has been proposed to learn a kernel from a set of base kernels. However, MKL-SRC only considers the within-class reconstruction residual while ignoring the between-class relationship, when learning the kernel weights. In this paper, we propose a novel multiple kernel sparse representation-based classifier (MKSRC), and then we use it as a criterion to design a multiple kernel sparse representation based orthogonal discriminative projection method (MK-SR-ODP). The proposed algorithm aims at learning a projection matrix and a corresponding kernel from the given base kernels such that in the low dimension subspace the between-class reconstruction residual is maximized and the within-class reconstruction residual is minimized. Furthermore, to achieve a minimum overall loss by performing recognition in the learned low-dimensional subspace, we introduce cost information into the dimensionality reduction method. The solutions for the proposed method can be efficiently found based on trace ratio optimization method [33]. Extensive experimental results demonstrate the superiority of the proposed algorithm when compared with the state-of-the-art methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maurer, Simon A.; Clin, Lucien; Ochsenfeld, Christian, E-mail: christian.ochsenfeld@uni-muenchen.de
2014-06-14
Our recently developed QQR-type integral screening is introduced in our Cholesky-decomposed pseudo-densities Møller-Plesset perturbation theory of second order (CDD-MP2) method. We use the resolution-of-the-identity (RI) approximation in combination with efficient integral transformations employing sparse matrix multiplications. The RI-CDD-MP2 method shows an asymptotic cubic scaling behavior with system size and a small prefactor that results in an early crossover to conventional methods for both small and large basis sets. We also explore the use of local fitting approximations which allow to further reduce the scaling behavior for very large systems. The reliability of our method is demonstrated on test sets formore » interaction and reaction energies of medium sized systems and on a diverse selection from our own benchmark set for total energies of larger systems. Timings on DNA systems show that fast calculations for systems with more than 500 atoms are feasible using a single processor core. Parallelization extends the range of accessible system sizes on one computing node with multiple cores to more than 1000 atoms in a double-zeta basis and more than 500 atoms in a triple-zeta basis.« less
An Efficient Image Compressor for Charge Coupled Devices Camera
Li, Jin; Xing, Fei; You, Zheng
2014-01-01
Recently, the discrete wavelet transforms- (DWT-) based compressor, such as JPEG2000 and CCSDS-IDC, is widely seen as the state of the art compression scheme for charge coupled devices (CCD) camera. However, CCD images project on the DWT basis to produce a large number of large amplitude high-frequency coefficients because these images have a large number of complex texture and contour information, which are disadvantage for the later coding. In this paper, we proposed a low-complexity posttransform coupled with compressing sensing (PT-CS) compression approach for remote sensing image. First, the DWT is applied to the remote sensing image. Then, a pair base posttransform is applied to the DWT coefficients. The pair base are DCT base and Hadamard base, which can be used on the high and low bit-rate, respectively. The best posttransform is selected by the l p-norm-based approach. The posttransform is considered as the sparse representation stage of CS. The posttransform coefficients are resampled by sensing measurement matrix. Experimental results on on-board CCD camera images show that the proposed approach significantly outperforms the CCSDS-IDC-based coder, and its performance is comparable to that of the JPEG2000 at low bit rate and it does not have the high excessive implementation complexity of JPEG2000. PMID:25114977
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Approximate equiangular tight frames for compressed sensing and CDMA applications
NASA Astrophysics Data System (ADS)
Tsiligianni, Evaggelia; Kondi, Lisimachos P.; Katsaggelos, Aggelos K.
2017-12-01
Performance guarantees for recovery algorithms employed in sparse representations, and compressed sensing highlights the importance of incoherence. Optimal bounds of incoherence are attained by equiangular unit norm tight frames (ETFs). Although ETFs are important in many applications, they do not exist for all dimensions, while their construction has been proven extremely difficult. In this paper, we construct frames that are close to ETFs. According to results from frame and graph theory, the existence of an ETF depends on the existence of its signature matrix, that is, a symmetric matrix with certain structure and spectrum consisting of two distinct eigenvalues. We view the construction of a signature matrix as an inverse eigenvalue problem and propose a method that produces frames of any dimensions that are close to ETFs. Due to the achieved equiangularity property, the so obtained frames can be employed as spreading sequences in synchronous code-division multiple access (s-CDMA) systems, besides compressed sensing.
Bayesian sparse channel estimation
NASA Astrophysics Data System (ADS)
Chen, Chulong; Zoltowski, Michael D.
2012-05-01
In Orthogonal Frequency Division Multiplexing (OFDM) systems, the technique used to estimate and track the time-varying multipath channel is critical to ensure reliable, high data rate communications. It is recognized that wireless channels often exhibit a sparse structure, especially for wideband and ultra-wideband systems. In order to exploit this sparse structure to reduce the number of pilot tones and increase the channel estimation quality, the application of compressed sensing to channel estimation is proposed. In this article, to make the compressed channel estimation more feasible for practical applications, it is investigated from a perspective of Bayesian learning. Under the Bayesian learning framework, the large-scale compressed sensing problem, as well as large time delay for the estimation of the doubly selective channel over multiple consecutive OFDM symbols, can be avoided. Simulation studies show a significant improvement in channel estimation MSE and less computing time compared to the conventional compressed channel estimation techniques.
Multimode waveguide speckle patterns for compressive sensing.
Valley, George C; Sefler, George A; Justin Shaw, T
2016-06-01
Compressive sensing (CS) of sparse gigahertz-band RF signals using microwave photonics may achieve better performances with smaller size, weight, and power than electronic CS or conventional Nyquist rate sampling. The critical element in a CS system is the device that produces the CS measurement matrix (MM). We show that passive speckle patterns in multimode waveguides potentially provide excellent MMs for CS. We measure and calculate the MM for a multimode fiber and perform simulations using this MM in a CS system. We show that the speckle MM exhibits the sharp phase transition and coherence properties needed for CS and that these properties are similar to those of a sub-Gaussian MM with the same mean and standard deviation. We calculate the MM for a multimode planar waveguide and find dimensions of the planar guide that give a speckle MM with a performance similar to that of the multimode fiber. The CS simulations show that all measured and calculated speckle MMs exhibit a robust performance with equal amplitude signals that are sparse in time, in frequency, and in wavelets (Haar wavelet transform). The planar waveguide results indicate a path to a microwave photonic integrated circuit for measuring sparse gigahertz-band RF signals using CS.
Segmentation of High Angular Resolution Diffusion MRI using Sparse Riemannian Manifold Clustering
Wright, Margaret J.; Thompson, Paul M.; Vidal, René
2015-01-01
We address the problem of segmenting high angular resolution diffusion imaging (HARDI) data into multiple regions (or fiber tracts) with distinct diffusion properties. We use the orientation distribution function (ODF) to represent HARDI data and cast the problem as a clustering problem in the space of ODFs. Our approach integrates tools from sparse representation theory and Riemannian geometry into a graph theoretic segmentation framework. By exploiting the Riemannian properties of the space of ODFs, we learn a sparse representation for each ODF and infer the segmentation by applying spectral clustering to a similarity matrix built from these representations. In cases where regions with similar (resp. distinct) diffusion properties belong to different (resp. same) fiber tracts, we obtain the segmentation by incorporating spatial and user-specified pairwise relationships into the formulation. Experiments on synthetic data evaluate the sensitivity of our method to image noise and the presence of complex fiber configurations, and show its superior performance compared to alternative segmentation methods. Experiments on phantom and real data demonstrate the accuracy of the proposed method in segmenting simulated fibers, as well as white matter fiber tracts of clinical importance in the human brain. PMID:24108748
An embedded system for face classification in infrared video using sparse representation
NASA Astrophysics Data System (ADS)
Saavedra M., Antonio; Pezoa, Jorge E.; Zarkesh-Ha, Payman; Figueroa, Miguel
2017-09-01
We propose a platform for robust face recognition in Infrared (IR) images using Compressive Sensing (CS). In line with CS theory, the classification problem is solved using a sparse representation framework, where test images are modeled by means of a linear combination of the training set. Because the training set constitutes an over-complete dictionary, we identify new images by finding their sparsest representation based on the training set, using standard l1-minimization algorithms. Unlike conventional face-recognition algorithms, we feature extraction is performed using random projections with a precomputed binary matrix, as proposed in the CS literature. This random sampling reduces the effects of noise and occlusions such as facial hair, eyeglasses, and disguises, which are notoriously challenging in IR images. Thus, the performance of our framework is robust to these noise and occlusion factors, achieving an average accuracy of approximately 90% when the UCHThermalFace database is used for training and testing purposes. We implemented our framework on a high-performance embedded digital system, where the computation of the sparse representation of IR images was performed by a dedicated hardware using a deeply pipelined architecture on an Field-Programmable Gate Array (FPGA).
NASA Astrophysics Data System (ADS)
Zhao, Jin; Han-Ming, Zhang; Bin, Yan; Lei, Li; Lin-Yuan, Wang; Ai-Long, Cai
2016-03-01
Sparse-view x-ray computed tomography (CT) imaging is an interesting topic in CT field and can efficiently decrease radiation dose. Compared with spatial reconstruction, a Fourier-based algorithm has advantages in reconstruction speed and memory usage. A novel Fourier-based iterative reconstruction technique that utilizes non-uniform fast Fourier transform (NUFFT) is presented in this work along with advanced total variation (TV) regularization for a fan sparse-view CT. The proposition of a selective matrix contributes to improve reconstruction quality. The new method employs the NUFFT and its adjoin to iterate back and forth between the Fourier and image space. The performance of the proposed algorithm is demonstrated through a series of digital simulations and experimental phantom studies. Results of the proposed algorithm are compared with those of existing TV-regularized techniques based on compressed sensing method, as well as basic algebraic reconstruction technique. Compared with the existing TV-regularized techniques, the proposed Fourier-based technique significantly improves convergence rate and reduces memory allocation, respectively. Projected supported by the National High Technology Research and Development Program of China (Grant No. 2012AA011603) and the National Natural Science Foundation of China (Grant No. 61372172).
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Haider, Bilal; Krause, Matthew R.; Duque, Alvaro; Yu, Yuguo; Touryan, Jonathan; Mazer, James A.; McCormick, David A.
2011-01-01
SUMMARY During natural vision, the entire visual field is stimulated by images rich in spatiotemporal structure. Although many visual system studies restrict stimuli to the classical receptive field (CRF), it is known that costimulation of the CRF and the surrounding nonclassical receptive field (nCRF) increases neuronal response sparseness. The cellular and network mechanisms underlying increased response sparseness remain largely unexplored. Here we show that combined CRF + nCRF stimulation increases the sparseness, reliability, and precision of spiking and membrane potential responses in classical regular spiking (RSC) pyramidal neurons of cat primary visual cortex. Conversely, fast-spiking interneurons exhibit increased activity and decreased selectivity during CRF + nCRF stimulation. The increased sparseness and reliability of RSC neuron spiking is associated with increased inhibitory barrages and narrower visually evoked synaptic potentials. Our experimental observations were replicated with a simple computational model, suggesting that network interactions among neuronal subtypes ultimately sharpen recurrent excitation, producing specific and reliable visual responses. PMID:20152117
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering
Sicat, Ronell; Krüger, Jens; Möller, Torsten; Hadwiger, Markus
2015-01-01
This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs. PMID:26146475
Derek B. Van Berkel; Bronwyn Rayfield; Sebastián Martinuzzi; Martin J. Lechowicz; Eric White; Kathleen P. Bell; Chris R. Colocousis; Kent F. Kovacs; Anita T. Morzillo; Darla K. Munroe; Benoit Parmentier; Volker C. Radeloff; Brian J. McGill
2018-01-01
Sparsely settled forests (SSF) are poorly studied, coupled natural and human systems involving rural communities in forest ecosystems that are neither largely uninhabited wildland nor forests on the edges of urban areas. We developed and applied a multidisciplinary approach to define, map, and examine changes in the spatial extent and structure of both the landscapes...
Luo, Hanjiang; Guo, Zhongwen; Wu, Kaishun; Hong, Feng; Feng, Yuan
2009-01-01
Underwater acoustic sensor networks (UWA-SNs) are envisioned to perform monitoring tasks over the large portion of the world covered by oceans. Due to economics and the large area of the ocean, UWA-SNs are mainly sparsely deployed networks nowadays. The limited battery resources is a big challenge for the deployment of such long-term sensor networks. Unbalanced battery energy consumption will lead to early energy depletion of nodes, which partitions the whole networks and impairs the integrity of the monitoring datasets or even results in the collapse of the entire networks. On the contrary, balanced energy dissipation of nodes can prolong the lifetime of such networks. In this paper, we focus on the energy balance dissipation problem of two types of sparsely deployed UWA-SNs: underwater moored monitoring systems and sparsely deployed two-dimensional UWA-SNs. We first analyze the reasons of unbalanced energy consumption in such networks, then we propose two energy balanced strategies to maximize the lifetime of networks both in shallow and deep water. Finally, we evaluate our methods by simulations and the results show that the two strategies can achieve balanced energy consumption per node while at the same time prolong the networks lifetime. PMID:22399970
Recent advances on terrain database correlation testing
NASA Astrophysics Data System (ADS)
Sakude, Milton T.; Schiavone, Guy A.; Morelos-Borja, Hector; Martin, Glenn; Cortes, Art
1998-08-01
Terrain database correlation is a major requirement for interoperability in distributed simulation. There are numerous situations in which terrain database correlation problems can occur that, in turn, lead to lack of interoperability in distributed training simulations. Examples are the use of different run-time terrain databases derived from inconsistent on source data, the use of different resolutions, and the use of different data models between databases for both terrain and culture data. IST has been developing a suite of software tools, named ZCAP, to address terrain database interoperability issues. In this paper we discuss recent enhancements made to this suite, including improved algorithms for sampling and calculating line-of-sight, an improved method for measuring terrain roughness, and the application of a sparse matrix method to the terrain remediation solution developed at the Visual Systems Lab of the Institute for Simulation and Training. We review the application of some of these new algorithms to the terrain correlation measurement processes. The application of these new algorithms improves our support for very large terrain databases, and provides the capability for performing test replications to estimate the sampling error of the tests. With this set of tools, a user can quantitatively assess the degree of correlation between large terrain databases.
Exarchakis, Georgios; Lücke, Jörg
2017-11-01
Sparse coding algorithms with continuous latent variables have been the subject of a large number of studies. However, discrete latent spaces for sparse coding have been largely ignored. In this work, we study sparse coding with latents described by discrete instead of continuous prior distributions. We consider the general case in which the latents (while being sparse) can take on any value of a finite set of possible values and in which we learn the prior probability of any value from data. This approach can be applied to any data generated by discrete causes, and it can be applied as an approximation of continuous causes. As the prior probabilities are learned, the approach then allows for estimating the prior shape without assuming specific functional forms. To efficiently train the parameters of our probabilistic generative model, we apply a truncated expectation-maximization approach (expectation truncation) that we modify to work with a general discrete prior. We evaluate the performance of the algorithm by applying it to a variety of tasks: (1) we use artificial data to verify that the algorithm can recover the generating parameters from a random initialization, (2) use image patches of natural images and discuss the role of the prior for the extraction of image components, (3) use extracellular recordings of neurons to present a novel method of analysis for spiking neurons that includes an intuitive discretization strategy, and (4) apply the algorithm on the task of encoding audio waveforms of human speech. The diverse set of numerical experiments presented in this letter suggests that discrete sparse coding algorithms can scale efficiently to work with realistic data sets and provide novel statistical quantities to describe the structure of the data.
NASA Technical Reports Server (NTRS)
Jones, H. W.; Hein, D. N.; Knauer, S. C.
1978-01-01
A general class of even/odd transforms is presented that includes the Karhunen-Loeve transform, the discrete cosine transform, the Walsh-Hadamard transform, and other familiar transforms. The more complex even/odd transforms can be computed by combining a simpler even/odd transform with a sparse matrix multiplication. A theoretical performance measure is computed for some even/odd transforms, and two image compression experiments are reported.
2013-06-16
Science Dept., University of California, Irvine, USA 92697. Email : a.anandkumar@uci.edu,mjanzami@uci.edu. Daniel Hsu and Sham Kakade are with...Microsoft Research New England, 1 Memorial Drive, Cambridge, MA 02142. Email : dahsu@microsoft.com, skakade@microsoft.com 1 a latent space dimensionality...Sparse coding for multitask and transfer learning. ArxXiv preprint, abs/1209.0738, 2012. [34] G.H. Golub and C.F. Van Loan. Matrix Computations. The
NASA Astrophysics Data System (ADS)
Wang, H. T.; Chen, T. T.; Yan, C.; Pan, H.
2018-05-01
For App recommended areas of mobile phone software, made while using conduct App application recommended combined weighted Slope One algorithm collaborative filtering algorithm items based on further improvement of the traditional collaborative filtering algorithm in cold start, data matrix sparseness and other issues, will recommend Spark stasis parallel algorithm platform, the introduction of real-time streaming streaming real-time computing framework to improve real-time software applications recommended.
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
NASA Astrophysics Data System (ADS)
Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi
2017-10-01
Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
Sparse distributed memory overview
NASA Technical Reports Server (NTRS)
Raugh, Mike
1990-01-01
The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
Reconstructing cortical current density by exploring sparseness in the transform domain
NASA Astrophysics Data System (ADS)
Ding, Lei
2009-05-01
In the present study, we have developed a novel electromagnetic source imaging approach to reconstruct extended cortical sources by means of cortical current density (CCD) modeling and a novel EEG imaging algorithm which explores sparseness in cortical source representations through the use of L1-norm in objective functions. The new sparse cortical current density (SCCD) imaging algorithm is unique since it reconstructs cortical sources by attaining sparseness in a transform domain (the variation map of cortical source distributions). While large variations are expected to occur along boundaries (sparseness) between active and inactive cortical regions, cortical sources can be reconstructed and their spatial extents can be estimated by locating these boundaries. We studied the SCCD algorithm using numerous simulations to investigate its capability in reconstructing cortical sources with different extents and in reconstructing multiple cortical sources with different extent contrasts. The SCCD algorithm was compared with two L2-norm solutions, i.e. weighted minimum norm estimate (wMNE) and cortical LORETA. Our simulation data from the comparison study show that the proposed sparse source imaging algorithm is able to accurately and efficiently recover extended cortical sources and is promising to provide high-accuracy estimation of cortical source extents.
Sparse Bayesian learning for DOA estimation with mutual coupling.
Dai, Jisheng; Hu, Nan; Xu, Weichao; Chang, Chunqi
2015-10-16
Sparse Bayesian learning (SBL) has given renewed interest to the problem of direction-of-arrival (DOA) estimation. It is generally assumed that the measurement matrix in SBL is precisely known. Unfortunately, this assumption may be invalid in practice due to the imperfect manifold caused by unknown or misspecified mutual coupling. This paper describes a modified SBL method for joint estimation of DOAs and mutual coupling coefficients with uniform linear arrays (ULAs). Unlike the existing method that only uses stationary priors, our new approach utilizes a hierarchical form of the Student t prior to enforce the sparsity of the unknown signal more heavily. We also provide a distinct Bayesian inference for the expectation-maximization (EM) algorithm, which can update the mutual coupling coefficients more efficiently. Another difference is that our method uses an additional singular value decomposition (SVD) to reduce the computational complexity of the signal reconstruction process and the sensitivity to the measurement noise.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. Dale, Jr.
1990-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number of operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
High-performance equation solvers and their impact on finite element analysis
NASA Technical Reports Server (NTRS)
Poole, Eugene L.; Knight, Norman F., Jr.; Davis, D. D., Jr.
1992-01-01
The role of equation solvers in modern structural analysis software is described. Direct and iterative equation solvers which exploit vectorization on modern high-performance computer systems are described and compared. The direct solvers are two Cholesky factorization methods. The first method utilizes a novel variable-band data storage format to achieve very high computation rates and the second method uses a sparse data storage format designed to reduce the number od operations. The iterative solvers are preconditioned conjugate gradient methods. Two different preconditioners are included; the first uses a diagonal matrix storage scheme to achieve high computation rates and the second requires a sparse data storage scheme and converges to the solution in fewer iterations that the first. The impact of using all of the equation solvers in a common structural analysis software system is demonstrated by solving several representative structural analysis problems.
Efficient spares matrix multiplication scheme for the CYBER 203
NASA Technical Reports Server (NTRS)
Lambiotte, J. J., Jr.
1984-01-01
This work has been directed toward the development of an efficient algorithm for performing this computation on the CYBER-203. The desire to provide software which gives the user the choice between the often conflicting goals of minimizing central processing (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of three types of storage is selected for each diagonal. For each storage type, an initialization sub-routine estimates the CPU and storage requirements based upon results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the resources. The three storage types employed were chosen to be efficient on the CYBER-203 for diagonals which are sparse, moderately sparse, or dense; however, for many densities, no diagonal type is most efficient with respect to both resource requirements. The user-supplied weights dictate the choice.
Improved collaborative filtering recommendation algorithm of similarity measure
NASA Astrophysics Data System (ADS)
Zhang, Baofu; Yuan, Baoping
2017-05-01
The Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithm in personalized recommender systems. The key is to find the nearest neighbor set of the active user by using similarity measure. However, the methods of traditional similarity measure mainly focus on the similarity of user common rating items, but ignore the relationship between the user common rating items and all items the user rates. And because rating matrix is very sparse, traditional collaborative filtering recommendation algorithm is not high efficiency. In order to obtain better accuracy, based on the consideration of common preference between users, the difference of rating scale and score of common items, this paper presents an improved similarity measure method, and based on this method, a collaborative filtering recommendation algorithm based on similarity improvement is proposed. Experimental results show that the algorithm can effectively improve the quality of recommendation, thus alleviate the impact of data sparseness.
An Online Dictionary Learning-Based Compressive Data Gathering Algorithm in Wireless Sensor Networks
Wang, Donghao; Wan, Jiangwen; Chen, Junying; Zhang, Qiang
2016-01-01
To adapt to sense signals of enormous diversities and dynamics, and to decrease the reconstruction errors caused by ambient noise, a novel online dictionary learning method-based compressive data gathering (ODL-CDG) algorithm is proposed. The proposed dictionary is learned from a two-stage iterative procedure, alternately changing between a sparse coding step and a dictionary update step. The self-coherence of the learned dictionary is introduced as a penalty term during the dictionary update procedure. The dictionary is also constrained with sparse structure. It’s theoretically demonstrated that the sensing matrix satisfies the restricted isometry property (RIP) with high probability. In addition, the lower bound of necessary number of measurements for compressive sensing (CS) reconstruction is given. Simulation results show that the proposed ODL-CDG algorithm can enhance the recovery accuracy in the presence of noise, and reduce the energy consumption in comparison with other dictionary based data gathering methods. PMID:27669250
Wang, Donghao; Wan, Jiangwen; Chen, Junying; Zhang, Qiang
2016-09-22
To adapt to sense signals of enormous diversities and dynamics, and to decrease the reconstruction errors caused by ambient noise, a novel online dictionary learning method-based compressive data gathering (ODL-CDG) algorithm is proposed. The proposed dictionary is learned from a two-stage iterative procedure, alternately changing between a sparse coding step and a dictionary update step. The self-coherence of the learned dictionary is introduced as a penalty term during the dictionary update procedure. The dictionary is also constrained with sparse structure. It's theoretically demonstrated that the sensing matrix satisfies the restricted isometry property (RIP) with high probability. In addition, the lower bound of necessary number of measurements for compressive sensing (CS) reconstruction is given. Simulation results show that the proposed ODL-CDG algorithm can enhance the recovery accuracy in the presence of noise, and reduce the energy consumption in comparison with other dictionary based data gathering methods.
NASA Astrophysics Data System (ADS)
Lee, O. A.; Eicken, H.; Weyapuk, W., Jr.; Adams, B.; Mohoney, A. R.
2015-12-01
The significance of highly dispersed, remnant Arctic sea ice as a platform for marine mammals and indigenous hunters in spring and summer may have increased disproportionately with changes in the ice cover. As dispersed remnant ice becomes more common in the future it will be increasingly important to understand its ecological role for upper trophic levels such as marine mammals and its role for supporting primary productivity of ice-associated algae. Potential sparse ice habitat at sea ice concentrations below 15% is difficult to detect using remote sensing data alone. A combination of high resolution satellite imagery (including Synthetic Aperture Radar), data from the Barrow sea ice radar, and local observations from indigenous sea ice experts was used to detect sparse sea ice in the Alaska Arctic. Traditional knowledge on sea ice use by marine mammals was used to delimit the scales where sparse ice could still be used as habitat for seals and walrus. Potential sparse ice habitat was quantified with respect to overall spatial extent, size of ice floes, and density of floes. Sparse ice persistence offshore did not prevent the occurrence of large coastal walrus haul outs, but the lack of sparse ice and early sea ice retreat coincided with local observations of ringed seal pup mortality. Observations from indigenous hunters will continue to be an important source of information for validating remote sensing detections of sparse ice, and improving understanding of marine mammal adaptations to sea ice change.
Mafusire, Cosmas; Krüger, Tjaart P J
2018-06-01
The concept of orthonormal vector circle polynomials is revisited by deriving a set from the Cartesian gradient of Zernike polynomials in a unit circle using a matrix-based approach. The heart of this model is a closed-form matrix equation of the gradient of Zernike circle polynomials expressed as a linear combination of lower-order Zernike circle polynomials related through a gradient matrix. This is a sparse matrix whose elements are two-dimensional standard basis transverse Euclidean vectors. Using the outer product form of the Cholesky decomposition, the gradient matrix is used to calculate a new matrix, which we used to express the Cartesian gradient of the Zernike circle polynomials as a linear combination of orthonormal vector circle polynomials. Since this new matrix is singular, the orthonormal vector polynomials are recovered by reducing the matrix to its row echelon form using the Gauss-Jordan elimination method. We extend the model to derive orthonormal vector general polynomials, which are orthonormal in a general pupil by performing a similarity transformation on the gradient matrix to give its equivalent in the general pupil. The outer form of the Gram-Schmidt procedure and the Gauss-Jordan elimination method are then applied to the general pupil to generate the orthonormal vector general polynomials from the gradient of the orthonormal Zernike-based polynomials. The performance of the model is demonstrated with a simulated wavefront in a square pupil inscribed in a unit circle.
Compressed multi-block local binary pattern for object tracking
NASA Astrophysics Data System (ADS)
Li, Tianwen; Gao, Yun; Zhao, Lei; Zhou, Hao
2018-04-01
Both robustness and real-time are very important for the application of object tracking under a real environment. The focused trackers based on deep learning are difficult to satisfy with the real-time of tracking. Compressive sensing provided a technical support for real-time tracking. In this paper, an object can be tracked via a multi-block local binary pattern feature. The feature vector was extracted based on the multi-block local binary pattern feature, which was compressed via a sparse random Gaussian matrix as the measurement matrix. The experiments showed that the proposed tracker ran in real-time and outperformed the existed compressive trackers based on Haar-like feature on many challenging video sequences in terms of accuracy and robustness.
NASA Astrophysics Data System (ADS)
Sidborn, M.; Neretnieks, I.
2008-08-01
Processes that control the redox conditions in deep groundwaters have been studied. The understanding of such processes in a long-term perspective is important for the safety assessment of a deep geological repository for high-level nuclear waste. An oxidising environment at the depth of the repository would increase the solubility and mobility of many radionuclides, and increase the potential risk for radioactive contamination at the ground surface. Proposed repository concepts also include engineered barriers such as copper canisters, the corrosion of which increases considerably in an oxidising environment compared to prevailing reducing conditions. Swedish granitic rocks are typically relatively sparsely fractured and are best treated as a dual-porosity medium with fast flowing channels through fractures in the rock with a surrounding porous matrix, the pores of which are accessible from the fracture by diffusive transport. Highly simplified problems have been explored with the aim to gain understanding of the underlying transport processes, thermodynamics and chemical reaction kinetics. The degree of complexity is increased successively, and mechanisms and processes identified as of key importance are included in a model framework. For highly complex models, analytical expressions are not fully capable of describing the processes involved, and in such cases the solutions are obtained by numerical calculations. Deep in the rock the main source for reducing capacity is identified as reducing minerals. Such minerals are found inside the porous rock matrix and as infill particles or coatings in fractures in the rock. The model formulation also allows for different flow modes such as flow along discrete fractures in sparsely fractured rocks and along flowpaths in a fracture network. The scavenging of oxygen is exemplified for these cases as well as for more comprehensive applications, including glaciation considerations. Results show that chemical reaction kinetics control the scavenging of oxygen during a relatively short time with respect to the lifetime of the repository. For longer times the scavenging of oxygen is controlled by transport processes in the porous rock matrix. The penetration depth of oxygen along the flowpath depends largely on the hydraulic properties, which may vary significantly between different locations and situations. The results indicate that oxygen, in the absence of easily degradable organic matter, may reach long distances along a flow path during the life-time of the repository (hundreds to thousands of metres in a million years depending on e.g. hydraulic properties of the flow path and the availability of reducing capacity). However, large uncertainties regarding key input parameters exist leading to the conclusion that the results from the model must be treated with caution pending more accurate and validated data. Ongoing and planned experiments are expected to reduce these uncertainties, which are required in order to make more reliable predictions for a safety assessment of a nuclear waste repository.
Krieger, Medora Louise Hooper
1977-01-01
The landslides in the Kearny and El Capitan Mountain quadrangles, Pinal and Gila Counties, Ariz., are tabular or lens like masses of megabreccia enclosed in Miocene basin deposits. The megabreccias within individual slide blocks are composed of pervasively brecciated Precambrian and younger formations that remain in normal stratigraphic sequence, indicating that each landslide moved as a fairly coherent mass. The megabreccias consist of fresh, mostly angular rock fragments in a comminuted matrix of the same composition as the fragments. The matrix ranges in amount from sparse to abundant. Where the matrix is sparse, the fragments fit tightly with little or no rotation. Locally fragments are rotated but not moved far; most units within a slide block are lithologically homogeneous. The Kearny landslides are conformably interbedded in steeply east-dipping playa and alluvial deposits. They form map units from a few tens of meters to nearly 4 km long and from less than 1 to 270 m wide. Narrow ridges expose sections through the landslides at about right angles to the direction of movement. The upper (proximal) ends have been eroded; the lower (distal) ends are buried. The El Capitan landslide dips very gently southward. Although partly dissected during erosion of the enclosing alluvial and lakebed deposits, its approximate original outline is still preserved. It forms a thin sheet, 5-15 m thick and at least 3.8 km long; the maximum outcrop width, near its distal end, is about 1.5 km. The Kearny landslides show little evidence of having exerted differential pressure on the underlying soft playa and alluvial deposits, and the contacts with the underlying sediments have little relief. The distal end of the El Capitan landslide, on the other hand, has considerable relief. As the landslide came to an abrupt stop, the end plowed into the underlying sediments, compressing them into fol9.s and forming sandstone dikes. The source of the El Capitan landslide is a well-defined amphitheater on the south side of El Capitan Mountain 1,500 to more than 3,000 m above and 1.5-3 km north of the proximal end of the landslide. The long distance traveled on a very gentle slope indicates that the El Capitan landslide had a very low coefficient of friction, similar to some modern and prehistoric avalanches. According to Shreve, they may have traveled on a thin lubricating layer of compressed air. The coefficient of friction of the Kearny landslides cannot be determined. However, the nonturbulent character of both the Kearny and El Capitan landslides indicates that they slid rather than flowed.